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Abstract 

Evolutionary  algorithms  (EAs)  are  stochastic  population-based  algorithms  inspired  by  the  natural 
processes  of  selection,  mutation,  and  recombination.  EAs  are  often  employed  as  optimum  seeking  techniques, 
A  formal  framework  for  EAs  is  proposed,  in  which  evolutionary  operators  are  viewed  as  mappings  from 
parameter  spaces  to  spaces  of  random  functions.  Formal  definitions  within  this  framework  capture  the 
distinguishing  characteristics  of  the  classes  of  recombination,  mutation,  and  selection  operators.  EAs  which 
use  strictly  invariant  selection  operators  and  order  invariant  representation  schemes  comprise  the  class  of 
linkage-friendly  genetic  algorithms  (IfGAs). 

Fast  messy  genetic  algorithms  (fmGAs)  are  IfGAs  which  use  binary  tournament  selection  (BTS)  with 
thresholding,  periodic  filtering  of  a  fixed  number  of  randomly  selected  genes  from  each  individual,  and 
generalized  single-point  crossover.  Probabilistic  variants  of  thresholding  and  filtering  are  proposed.  EAs 
using  the  probabilistic  operators  are  generalized  fmGAs  (gfniGAs), 

A  dynamical  systems  model  of  IfGAs  is  developed  which  permits  prediction  of  expected  effectiveness, 
BTS  with  probabilistic  thresholding  is  modeled  at  various  levels  of  abstraction  as  a  Markov  chain.  Transitions 
at  the  most  detailed  level  involve  decisions  between  classes  of  individuals.  The  probability  of  correct  decision 
making  is  related  to  appropriate  maximal  order  statistics,  the  distributions  of  which  are  obtained.  Existing 
filtering  models  are  extended  to  include  probabilistic  individual  lengths. 

Sensitivity  of  IfGA  effectiveness  to  exogenous  parameters  limits  practical  applications.  The  IfGA 
parameter  selection  problem  is  formally  posed  cis  a  constrained  optimization  problem  in  which  the  cost  func¬ 
tional  is  related  to  expected  effectiveness.  Kuhn-Tucker  conditions  for  the  optimality  of  gfmGA  parameters 
are  derived.  Parameter  selection  techniques  are  proposed  for  fmGAs  and  gfmGAs. 
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L  Introduction 

1.1  Optimization 

Many  problems  in  science,  engineering,  and  operations  research  may  be  viewed  as  optimization  prob¬ 
lems.  Informally,  an  optimization  problem  involves  a  number  of  alternatives,  and  the  solution  to  the  problem 
is  the  set  of  alternatives  which  maximize  or  minimize  some  criterion.  Examples  include  determination  of  the 
minimum  energy  state  of  a  large  biomolecule,  design  of  a  jet  engine  with  maximum  thrust^to-weight  ratio, 
and  minimization  of  the  total  distance  traveled  in  visiting  a  set  of  destinations.  Frequently,  the  alternatives 
are  also  subject  to  one  or  more  constraints.  In  the  present  examples,  certain  atoms  of  the  biomolecule  may 
form  a  ring  system,  there  may  be  a  maximum  allowable  cost  for  the  engine,  and  repeated  visits  to  a  particular 
destination  may  be  prohibited.  This  last  example  is  the  famous  Traveling  Salesperson  Problem  (TSP)  [26]. 
Optimization  problems  are  defined  formally  in  Chapter  11. 

Often,  the  set  of  alternatives  is  isomorphic  to  a  region^  (or  union  of  several  regions)  of  For  example, 
in  identifying  the  minimum  energy  state  of  a  large  biomolecule,  the  possible  arrangements  of  the  constituent 
atoms  are  determined  by  some  subset  of  the  molecule’s  dihedral  angles  (making  the  common  simplifying 
assumption  that  all  bond  lengths  and  bond  angles,  and  possibly  some  of  the  dihedral  angles,  are  fixed  at 
their  ‘‘equilibrium’’  values  [73]).  Each  angle  takes  on  values  in  the  interval  (-tt.tt]  C  E,  so  the  set  of  possible 
states  is  isomorphic  to  (— 7r,7r]^  C  E’^  where  n  is  the  number  of  variable  dihedral  angles. 

In  contrast,  the  set  of  alternatives  for  a  combinatoric  optimization  problem  is  discrete  (i.e.  finite  or 
countably  infinite).  The  previous  example  may  be  further  simplified  by  assuming  that  each  variable  dihedral 
angle  assumes  values  in  {  —  tt}.  so  that  the  set  of  alternatives  is  isomorphic  to  {  —  f,  tt}”.  The  resulting 

^  Apostol  [3]  defines  “region”  as  follows:  “A  set  in  R”  is  called  a  region  if  it  is  the  union  of  an  open  connected  set  with  some, 
none,  or  all  its  boundary  points.” 
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optimization  problem  is  combinatoric.  Likewise,  the  set  of  alternatives  in  the  TSP  is  the  set  of  permutations 
over  the  set  of  destinations.  This  set  is  of  course  finite,  although  possibly  quite  large. 

There  is  only  one  general  method  guaranteed  to  obtain  the  optimum  alternative  of  an  arbitrary  opti¬ 
mization  problem,  that  being  exhaustive  search  [63]  (pure  random  search  being  a  special  case  thereof).  For 
many  optimization  problems  of  practical  interest,  the  number  of  alternatives  prohibits  their  enumeration. 
Thus  the  practitioner  is  often  forced  to  settle  for  solutions  which  are  “good  enough,”  even  though  they  are 
not  optimal.  When  the  problem  is  posed  as  such,  it  may  be  referred  to  as  a  semi- optimization  problem  [62], 
and  the  solution  techniques  employed  are  called  optimum- seeking  techniques.  Even  finding  an  ^‘acceptable” 
solution  may  require  exploring  a  significant  part  of  a  large  search  space. 

The  various  techniques  used  to  solve  optimization  problems  are  part  of  what  come  to  be  a  unified 
theory  of  optimization.  Some  techniques  have  been  known  and  used  for  centuries,  while  the  advent  of  the 
high  speed  digital  computer  has  enabled  the  application  of  other  techniques  which  were  previously  impractical 
due  to  the  large  number  of  calculations  required.  It  also  brought  about  the  development  of  entirely  new 
techniques  designed  explicitly  to  exploit  the  strengths  of  the  digital  computer.  Still,  problem  sizes  are  limited 
by  processor  speed  and  memory  size,  and  the  relative  efficiency  and  effectiveness  of  existing  algorithms  are 
not  necessarily  preserved  by  hardware  advances. 

This  last  observation  is  especially  important  as  multiprocessor  architectures  (parallel  distributed, 
or  otherwise)  become  more  prevalent.  Realization  of  the  effectiveness  and  efficiency  improvements  made 
possible  by  such  architectures  depends  on  the  design  and  use  of  appropriate  multiprocessor  algorithms.  Just 
as  the  arrival  of  single  processor  computers  spawned  the  development  of  new  techniques,  multiprocessor 
architectures  invite  the  investigation  of  a  new  set  of  optimization  tools. 

1.2  Evolutionary  Algorithms 

A  promising  set  of  candidates  for  such  investigation  are  a  class  of  algorithms  inspired  by  the  princi¬ 
ples  of  evolution,  known  appropriately  as  evolutionary  algorithms.  These  techniques  operate  by  applying 
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biologically-inspired  operators,  such  as  recombination,  mutation,  and  selection,  to  a  population  of  individ¬ 
uals,  each  of  which  represents  a  candidate  solution  (alternative).  Their  use  as  optimum  seeking  techniques 
derives  from  the  resulting  analogy  to  the  principle  of  ‘‘survival  of  the  fittest." 

Because  of  their  population  based  approach,  evolutionary  algorithms  are  well  suited  for  implementation 
on  multiprocessor  systems.  One  example  which  receives  significant  attention  in  the  literature  in  this  regard 
is  the  simple  genetic  algorithm,.  The  fast  messy  genetic  algorithm  is  another.  It  is  representative  of  a  class  of 
evolutionary  algorithms  which  are  called  linkage -friendly  genetic  algorithms  in  this  research.  Each  of  these 
algorithms  is  defined  precisely  in  Chapter  II, 

Linkage-friendly  genetic  algorithms  are  potentially  more  effective  and  efficient  than  the  simple  genetic 
algorithm  for  a  large  class  of  optimization  problems,  but  their  properties  are  not  well  understood.  In  partic¬ 
ular,  the  fast  messy  genetic  algorithm  is  demonstrated  in  limited  applications  to  pedagogical  problems  [35] 
to  be  an  effective  and  efficient  optimum  seeking  technique.  However,  its  practical  use  is  limited  by  its  depen¬ 
dence  on  a  large  number  of  exogenous  parameters.  Specifically,  its  effectiveness  depends  strongly  on  its  many 
“filtering"  and  "thresholding’  parameters.  Currently  available  parameter  selection  methodologies  [35,  46.  47] 
are  ad  hoc  in  nature  and  do  not  reliably  result  in  satisfactory  effectiveness. 

1.3  Problem  Statem.ent  and  Approach 

The  primary  objectives  of  this  research  are  to 

•  mathematically  model  those  properties  of  specific  linkage-friendly  genetic  algorithms  which  are  related 
to  expected  effectiveness;  and 

•  develop  exogenous  parameter  selection  techniques  for  those  linkage-friendly  genetic  algorithms,  focusing 
on  maximizing  their  expected  effectiveness. 

Linkage-friendly  genetic  algorithms  are  modeled  as  dynamical  systems,  and  expected  effectiveness 
is  defined  as  a  deterministic  function  of  the  system  state.  The  state  transitions  are  determined  by  the 
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specific  operators  used  by  the  algorithm,  and  the  parameters  of  those  operators.  The  operators  modeled 
are  generalizations  of  the  fast  messy  genetic  algorithm's  ‘•building  block  filtering"  and  “binary  tournament 
selection  with  thresholding’'  operators. 

The  dynamical  systems  model  predicts  the  expected  effectiveness  resulting  from  a  particular  choice  of 
filtering  and  thresholding  parameters.  Consequently,  the  parameter  selection  problem  may  be  posed  as  an 
optimization  problem.  The  set  of  alternatives  is  the  permissible  set  of  filtering  and  thresholding  parameters, 
and  the  criterion  to  be  maximized  is  expected  effectiveness.  Taking  this  perspective,  this  research  develops 
exogenous  parameter  selection  techniques  based  on  standard  optimum  seeking  techniques.  A  parameter 
selection  technique  is  considered  acceptable  if  it  satisfies  the  following  criteria: 

1.  the  technique  guarantees  expected  effectiveness  no  worse  than  that  resulting  from  the  best  set  of 
parameters  obtained  using  existing  techniques, 

2.  the  technique  requires  no  a  priori  knowledge  of  the  optimal  solution. 

3.  the  technique  requires  no  design  parameters  beyond  those  of  the  linkage-friendly  genetic  algorithm; 
and 

4.  the  computational  effort  required  by  the  technique  scales  well  with  the  effort  required  by  the  linkage- 
friendly  genetic  algorithm. 

1,4  Organization  of  the  Dissertation 

The  background  necessary  to  fully  define  the  problem  outlined  in  Section  1.3  is  provided  in  Chapter  11. 
beginning  with  a  brief  introduction  to  optimization  theory  and  development  of  a  general  framework  for 
evolutionary  algorithms.  The  remainder  of  the  chapter  introduces  simple  genetic  algorithms,  then  focuses 
on  linkage-friendly  genetic  algorithms.  Chapter  III  presents  the  previously  mentioned  generalizations  of  the 
fast  messy  genetic  algorithm  operators.  The  mathematical  model  of  building  block  filtering  is  developed  in 
Chapter  IV,  and  the  model  of  tournament  selection  is  developed  in  Chapter  V.  The  parameter  selection 
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problem  is  formally  posed  as  an  optimization  problem  and  parameter  selection  techniques  are  proposed  in 
Chapter  VL  Finally,  Chapter  VII  presents  conclusions  and  recommendations  for  future  research. 
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11.  Selected  Topics  in  Evolutionary  Algorithms 

This  chapter  provides  a  background  in  evolutionary  algorithms  (EAs).  focusing  on  genetic  algorithms 
(GAs).  Selected  concepts  of  optimization  theory  are  introduced  in  Section  2.1.  The  relationship  of  GAs 
to  the  more  general  class  of  EAs  is  discussed  briefly  in  Section  2.2.  A  considerable  portion  of  the  chapter 
(Section  2.3)  is  devoted  to  development  of  a  formal  framework  for  evolutionary  algorithms,  including  aspects 
which  are  novel  contributions  of  this  research.  The  section  presents  standard  definitions  of  decoding  and 
fitness  scaling  functions,  as  well  as  novel  definitions  of  evolutionary  operators  in  general  and  recombination, 
mutation,  and  selection  operators  in  particular. 

Within  the  formal  framework,  Section  2.4  defines  the  simple  genetic  algorithm  (sGA)  [29],  which  is  a 
primary  focus  in  EA  research.  As  an  optimum  seeking  technique,  the  sGA  exhibits  the  significant  drawback 
that  its  effectiveness  is  sensitive  to  both  permutations  in  the  decoding  function  and  the  choice  of  fitness  scaling 
function.  The  dependence  of  sGA  effectiveness  on  the  fitness  scaling  function  is  examined  theoretically  in 
Section  2,5.  The  remainder  of  the  chapter  discusses  linkage- friendly  genetic  algorithms  (Section  2.6),  for 
which  the  effectiveness  is  independent  of  both  permutations  in  the  decoding  function  and  the  choice  of 
fitness  scaling  function, 

2.1  Optimization  Theory 

This  section  introduces  selected  fundamental  concepts  and  terminology  of  optimization  theory.  For  a 
more  thorough  treatment,  including  discussion  of  techniques,  see  for  example  Pierre  [63].  An  optimization 
problem  involves  either  maximization  or  minimization  of  a  function  /  :  ►  R  over  a  set  fi  C  The 

function  /  is  known  by  various  names  in  the  optimization  literature  including  objective  function,  objective 
functional,  cost  function,  and  performance  measure.  The  set  Q  is  known  as  either  the  feasible  region,  the 
feasible  set,  or  the  admissible  set. 

A  point  X  E  is  called  a  point  of  strong  local  m.aximum  (or  maximizer)  if  there  exists  an  e  >  0  such 
that  if  X  e  fi  -  {x}  9^  {}  and  ||x  -  x||  <  e  then  /(x)  >  /(x).  In  the  sequel,  such  a  point  x  is  called  simply  a 
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point  of  local  m.aximuni.  The  value  /(x)  is  then  a  local  maximum  of  /  in  12.  Definitions  of  points  of  (strong) 
local  miniw.vm  and  local  minimum,  are  of  course  directly  analogous.  Any  point  x  E  12  which  is  either  a  point 
of  local  maximum  or  a  point  of  local  minimum  is  called  a  relative  extremum  point 

If  a  point  X  E  12  satisfies  /(x)  >  /(x)  for  a/i  x  E  12  —  {x}  7*^  {},  then  /(x)  is  the  absolute  maximum 
(or  global  maximum).  Similarly,  if  a  point  x  E  12  satisfies  /(x)  <  /(x)  for  ai/  x  E  “  {x}  7^  {},  then  /(x)  is 
the  absolute  minimum  (or  global  minimum.). 

2.2  Relationship  of  Genetic  Algorithms  to  Evolutionary  Algorithms 

Genetic  algorithms  (GAs)  are  a  form  of  computation  inspired  by  theories  of  evolution.  This  places 
them  in  the  class  of  algorithms  called  Evolutionary  Algorithms  (EAs).  Other  members  of  this  class  include 
Evolution  Strategies  (ESs)  [66,  69]  and  Evolutionary  Programming  (EP)  [24]  (see  Figure  1).  Thomas  Back  [6] 


provides  an  excellent  review  of  all  three,  including  a  historical  perspective. 

Genetic  algorithms  were  first  proposed  by  Holland  in  connection  with  his  theories  of  complex  adaptive 
systems  [42]  (where  they  were  called  ‘Teproductive  plans'").  De  Jong  later  applied  genetic  algorithms  to  the 
functional  optimization  problem  [17].  Since  then,  a  significant  portion  of  the  genetic  algorithm  literature 
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has  been  devoted  to  inodificatioiis  aimed  at  improving  the  effectiveness  and  efficiency  of  genetic  algorithms 
as  optimum  seeking  techniques. 

Historically,  the  three  major  areas  developed  independently  between  the  1960’s  and  the  1980’s.  Interest 
in  evolutionary  algorithms,  and  genetic  algorithms  in  particular,  grew  dramatically  in  the  late  1980’s  and 
early  1990’s,  as  demonstrated  by  the  success  of  numerous  international  confkences  (see  Table  1)  and  the 
appearance  of  a  number  of  textbooks  (e.g.  [6,  24,  29,  44,  58,  60,  70]).  Also  during  this  period,  interaction  be¬ 
tween  the  communities  increased,  as  many  researchers  began  transferring  analytical  insight  and  experimental 
approaches  amongst  the  three  fields. 


Table  1.  Major  Evolutionary  Algorithm  Conference  Proceedings 


Conference 

Primary  Focus 

Proceedings 

International  Conference  on  Genetic  Algorithms 

GAs 

37.  38,  67.  8,  25.  19] 

Foundations  of  Genetic  Algorithms 

GAs 

65,  75,  76,  9] 

Parallel  Problem  Solving  from  Nature 

GAs  and  ESs 

68.  51.  13,  18] 

Annual  Conference  on  Evolutionary  Programming 

EPs 

20.  21,  71,  52,  23] 

IEEE  International  Conference  on  Evolutionary  Computing 

EAs 

[59.  22,  72] 

2.3  A  Framework  for  Evolutionary  Algorithms 

As  a  first  step  towards  unifying  the  theory  of  the  three  major  evolutionary  algorithm  paradigms,  Back 
and  Schwefel  propose  a  general  "'algorithmic  description’'  for  EAs.  which  they  specialize  for  each  of  the  three 
paradigms  [7].  The  various  mappings  appearing  in  their  description  are  defined  so  broadly  that  their  essential 
characteristics  are  overlooked.  This  section  develops  formal  definitions  of  the  mappings  which  capture  these 
characteristics,  then  presents  an  extension  of  Hack  and  Schwefel  s  algorithmic  description. 

2.3.1  Representation.  Associated  with  each  evolutionary  algorithm  is  a  non-empty  set  /,  called  the 
individual  space  of  the  algorithm.  Each  individual  a  €  /  represents  a  candidate  solution  to  the  optimization 
problem  at  hand.  The  representation  scheme  is  formally  defined  by  the  decoding  function. 
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Definition  2.3.1  (Decoding  function):  Let  I  be  a  non-empty  set  (the  individual  space ).  and  f  :  M'' — 
(the  objective  function^.  If  D  :  I  — >  is  total,  i.e.  the  domain  of  D  is  all  of  I,  then  D  is  called  a  decoding 

function.  □ 

The  mapping  D  is  not  necessarily  surjective  (in  fact,  it  cannot  be  if  I  is  countable).  The  range  of  D 
determines  the  subset  of  R”  actually  available  for  exploration  by  the  evolutionary  algorithm. 

The  fitness  of  an  individual  is  an  indication  of  the  quality  of  the  candidate  solution  represented  by  the 
individual.  The  mapping  which  yields  this  indication  is  the  fitness  function.  It  is  the  fitness  function  which 
the  evolutionary  algorithm  actually  attempts  to  optimize. 

Definition  2.3.2  (Fitness  function):  Let  I  he  a  non-empty  set  (the  individttal  space),  D  :  I  — >  R^  (the 
decoding  function),  f  :  R  >  R  (the  objective  function) ,  and  T3  :  R  — >  R  (the  fitness  scaling  function^). 
Then  $  =  o  /  o  D  is  called  a  fitness  function.  □ 

In  this  definition  it  is  understood  that  the  objective  function  /  is  determined  by  the  application,  while  the 
specification  of  the  decoding  function  D  and  the  fitness  scaling  function  T3  are  design  issues.  An  important 
design  criteria  for  the  scaling  function  is  that  it  preserve  the  partial  ordering  induced  on  the  individual  space 
by  the  decoding  and  objective  functions. 

Definition  2.3.3  (Order-preserving  fitness  scaling  function):  If  for  every  non-empty  set  I  (the 
individual  space),  every  (a,  b)  G  I^ .  every  D  :  I  — >  R’^  (the  decoding  function),  and  every  f  :  R^  — ^  R 
(the  objective  function),  a  mapping  T3  :R  — ^  R  (the  fitness  scaling  function)  satisfies 

f(D(a))  <  f(D{h))  r,(/(D(a))  <  T3{f{D(h)))  , 

then  Tg  is  called  an  order-preserving  fitness  scaling  function.  □ 

^Use  of  the  term  “scaling’  for  the  mapping  Tg  is  consistent  with  the  genetic  algorithms  literature,  although  is  not  descriptive 
of  many  of  the  operators  used  in  practice. 
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The  following  lemma  provides  a  condition  which  is  necessary  and  sufficient  for  a  scaling  function  to  be 
order-preserving. 

Lemma  2.3.4  A  fitness  scaling  function  is  order-preserving  if  and  only  if  it  is  strictly  increasing. 

Proof:  “If’:  Let  Ts  :  M  — ^  R  be  strictly  increasing,  I  a  non-empty  set,  (a, b)  e  P,  D  :  I  — > 

for  some  n  £  N.  /  :  R"  — *  R,  9a  =  f(D(a)),  and  =  f{D{h)).  Suppose  f{D{a))  <  f(D{h)).  Then 
ga  <  gb.  and  because  T,  is  strictly  increasing.  T,[f(D{&)))  =  T,{ga)  <  T,(gb)  =  TsifiD{b))].  Thus, 
f(D{a))  <  /(D(b))  rs(/(P(a))  <  Ta{f(D{h)).  Since  /,  a,  b,  D,  and  /  are  arbitrary,  T,  is  an  order- 

preserving  fitness  scaling  function. 

"Only  if';  Let  Ts  :  R  — »  R  be  an  order-preserving  fitness  scaling  function,  x,  j/  £  R,  7  =  {x,y}.  a  =  i, 
b  =  2/.  and  D  =  f  :  R  — ►  M  the  identity  mapping.  Suppose  x  <  y.  Then  f(D{a))  =  x  <  y  —  /(D(b)), 
and  because  Ts  is  an  order-preserving  fitness  scaling  function  Ts{x)  -  Ts(f{D{a)))  <  Ts(fiD{h)))  =  Ts(y). 
Thus.  X  <  y  =4-  Ts{x)  <  Ts{y).  Since  x  and  y  are  arbitrary.  is  strictly  increasing,  ■ 

Execution  of  an  evolutionary  algorithm  typically  begins  by  randomly  sampling  individuals  from  7.  The 
sampling  is  typically  performed  with  replacement,  and  the  resulting  collection  of  individuals  is  called  the 
initial  population,  denoted  P(0).^  More  generally,  a  population  is  a  collection  P  —  {ai, . . . ,  of  individuals 
a;  £  7.  and  the  number  of  individuals  p  in  the  population  is  the  population  size. 

Following  initialization,  execution  proceeds  iteratively.  Each  iteration  consists  of  application  of  one 
or  more  evolutionary  operators.  The  combined  effect  of  the  evolutionary  operators  applied  in  a  particular 
generation  t  £  N  is  to  transform  the  current  population  P{t)  into  a  new  population  P(t+  1). 


2.3.2  Evolutionary  Operators.  Most  authors,  including  Back  and  Schwefel,  describe  evolutionary 
operators  as  directly  mapping  populations  into  populations,  with  the  mapping  being  “controlled"  by  the 


In  this  research,  populations  are  treated  interchangeably  as  n-tuples  of  individuals  or  multisets  of  individuals,  as  convenient. 
The  term  "multiset"  describes  the  primitive  mathematical  concept  of  a  collection  of  elements  for  which  the  multiplicities  of  the 
elements,  but  not  their  order,  is  important  [1].  For  example,  the  multiset  {a,  0.5}  is  equal  to  the  multiset  {a.5.o},  and  neither 
is  equal  to  the  multiset  {a,  a,  6,  6}.  Multisets  are  also  called  bags. 

One  further  word  regarding  terminology.  What  is  referred  to  as  a  "population”  in  the  evolutionary  algorithms  literature  is 
referred  to  as  a  ‘‘sample”  in  the  mathematical  statistics  literature.  Similarly,  the  “individual  space”  of  evolutionary  algorithms 
corresponds  to  the  “population”  of  mathematical  statistics.  The  latter  is  also  called  the  “grand  population”  [12]. 
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parameters  of  the  operator.  This  research  proposes  a  more  formal  view  of  evolutionary  operators  as  mappings 
from  parameter  spaces  to  random  population  transformations  (i.e..  random  functions^  with  values  in  the  set  of 
population  transformations).  This  view  precisely  identifies  the  relationships  among  the  operator  parameters 
and  the  various  mappings.  In  the  following  definitions  and  the  sequel,  the  set  of  mappings  from  a  set  Si  to 
a  set  ^2  is  denoted  T(5i,52). 

The  first  definition  is  that  of  a  population  transformation,  which  is  any  mapping  from  populations  to 
populations,  whether  or  not  the  populations  are  of  the  same  size  (see  Figure  2). 


Figure  2.  The  population  transformation  T  deterministically  maps  the  parent  population  P  (of  size  ft)  to 
the  offspring  population  P'  (of  size  ft'). 

Definition  2,3*5  (Population  transformation);  Let  I  be  a  non-empty  set  (the  individual  space),  and 
fi,ft'  G  (the  parent  and  offspring  population  sizes,  respectively).  A  mapping  T  :  is  called  a 

population  transformation.  IfT{P)  =  P'  then  P  is  called  a  parent  population  and  P'  is  called  an  offspring 
population.  If  ft  ^  fi' ,  then  they  are  called  simply  the  population  size.  □ 

^Let  n  be  a  sample  space,  and  let  V  be  a  set  of  functions.  Then  X  :  Cl  — ►  V  is  a  random  function  [10].  The  most  frequently 
encountered  random  functions  are  stochastic  processes,  for  which  the  domain  of  each  /  G  V  is  R. 
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The  population  transformation  resulting  from  the  application  of  an  evolutionary  operator,  to  include 
the  offspring  population  size,  often  depends  on  the  outcome  of  a  random  experiment.  This  dependence 
motivates  the  concept  of  a  random  population  transformation  (Figure  3). 


Figure  3.  The  random  population  transformation  R  maps  the  random  event  w  (with  sample  space  €l)  to 
the  population  transformation  T ,  which  maps  parent  populations  of  size  /i  (which  is  independent 
of  <j)  to  offspring  populations  of  some  fixed  size  /i'  €  Z+  (which  may  depend  on  w). 


Definition  2.3.6  (Random  population  transformation):  Let  I  be  a  non-empty  set  (the  individual 
space),  /i  e  Z+  (the  parent  population  size),  and  Q,  a  set  (the  sample  space).  A  random  function 


i?  :  ft  — »  T  IJ  /''' 

\  m'€2+ 

is  called  a  random  population  transformation. 

The  distribution  of  population  transformations  resulting  from  the  application  of  an  evolutionai’y  oper¬ 
ator  may  depend  on  one  or  more  parameters  of  the  operator.  That  is.  each  evolutionary  operator  maps  its 
parameters  to  a  random  population  transformation  (Figure  4). 
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Figure  4.  The  evolutionary  operator  X  maps  the  exogenous  parameter(s)  0  to  the  random  population 
transformation  R.  The  underlying  sample  space  of  R  is  fi.  Each  of  the  possible  population 
transformations  acts  on  populations  of  size  /t.  The  offspring  population  size  n'  €  Z+  may 
depend  on  0  as  well  as  the  random  event  a;  6  f2. 


Definition  2.3.7  (Evolutionsiry  operator):  Let  I  be  a  non-empty  set  (the  individual  space),  j.L  e 
(the  parent  population  size).  X  a  set  (the  parameter  spaced,  and  Q.  a  set  (the  sample  spacej.  A  mapping 


X:X— (J  7^*' 

is  called  an  evolutionary  operator.  The  set  of  evolutionary  operators  in  the  form  of  Equation  1  is  denoted 

SVOV{I.p,.X,Q,).  □ 

The  random  population  transformation  X(0)  is  denoted  X©.  The  population  transformation  X©(w) 
is  also  denoted  X©  to  maintain  consistency  with  the  notation  of  Back  and  Schwefel,  except  where  confusion 
may  arise.  In  particular,  the  offspring  population  [X©(w)](P)  is  denoted  X©(P).  Finally,  if  X  has  no 
parameters,  i.e.  X  €  £VOV(I,  fi,  {},n),  then  the  offspring  population  is  denoted  X{P). 
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The  specific  evolutionary  operators  used  are  typically  biologically  inspired.  The  guiding  principle  in 
their  design  is  typically  loose  analogy  to  Darwin’s  principle  of  "survival  of  the  fittest."  The  most  commonly 
used  evolutionary  operators  are  recombination,  mutation,  and  selection. 

Recombination  operators  are  the  most  general  of  the  three.  The  distinguishing  characteristic  of  recom¬ 
bination  operators  is  that  at  least  some  of  the  individuals  in  the  offspring  population  may  depend  on  more 
than  one  individual  in  the  parent  population.  The  following  definition  reflects  this  characteristic.  Because 
of  this,  it  is  more  restrictive  than  the  overly  general  definition  adopted  by  Back  and  Schwefel.  which  admits 
any  population  transformation  r  :  where  /x, /i'  €  Z+. 

Definition  2.3.8  (Recombination  operator):  Let  r  €  SVOV{I,iJ,,  X.fi).  If  there  exist  P  &  Q  eX, 
and  li)  e  Q  such  that  at  least  one  individual  in  the  offspring  population  t@(P)  depends  on  more  than  one 
individual  of  P  then  r  is  called  a  recombination  operator.  □ 

In  contrast  to  recombination  operators,  the  distinguishing  feature  of  mutation  operators  is  that  each  of  the 
individuals  in  the  offspring  population  depends  on  at  most  one  individual  in  the  parent  population. 

Definition  2.3.9  (Mutation  operator):  Let  m  €  £VOV{I,p.,\,Q.).  If  for  every  P  £  P,  every  0  €  X, 
and  every  w  €  fi.  each  individual  in  the  offspring  population  me{P)  depends  on  at  most  one  individual  of  P 
then  m  is  called  a  mutation  operator.  □ 

This  definition  of  mutation  is  more  general  than  Back  and  Schwefel’s,  which  assumes  that  parent  and  offspring 
population  sizes  are  equal. 

The  distinguishing  characteristics  of  selection  operators  are  that  every  individual  in  the  offspring  pop- 
ulation  is  also  a  member  of  the  parent  population,  and  that  the  population  transformation  depends  on  the 
fitnesses  of  the  individuals  in  the  parent  population.  The  following  definition  reflects  these  characteristics, 
in  contrast  to  Back  and  Schwefel’s,  which  admits  any  population  transformation  s  :  (I^'  U  7^'+^) _ >  p. 
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Definition  2.3,10  (Selection  operator):  Let  s  G  EVOV(L  fi.X  x  T{LR).Q).  If  for  every  P  G 
every  0  6  X,  and  every  fitness  function  ^  :  I  — ►  R,  s  satisfies 


a  €  5(0,$){P)  =>  a  €  P  . 


then  s  is  called  a  selection  operator.  O 

Back  [6]  formally  defines  specific  probabilistic  selection  operators  in  terms  of  selection  probabilities. 
The  following  definition  is  equivalent  to  his,  but  notationally  reflects  the  usual  dependence  of  selection 
probabilities  on  both  the  parent  population  and  the  fitness  function. 

Definition  2.3.11  (Selection  probability):  Let  s  €  EVOV{L  [i.X  x  T(LR),ft)  be  a  selection  operator, 
Q  EX,  ^  :  I  — ^  R  a  fitness  function,  P  £  and  a  €  P.  Then 

Pse  /(a:  P)  ^  Pr[aes,e,$)(P)|aeP] 

is  the  selection  probability  assigned  to  a.  £  P  by  □ 

A  selection  operator  is  order-based  if  order  preserving  transformations  of  the  fitness  function  also  preserve 
selection  probabilities  of  individuals. 

Definition  2.3.12  (Order-based  selection  operator):  Lets  £  EVOV{L  f^i.XxT(I,R),Q.)  be  a  selection 
operator.  If  for  every  Q  £  X  (the  operator  param.eters).  every  D  :  I  — >  R^  (the  decoding  function),  every 
f  :  R^  — >•  R  (the  objective  function),  every  order -preserving  fitness  scaling  function  Tg  :  R  — ►  R,  every 
population  P,  and  every  individual  a  £  P,  s  satisfies 

Psei ^(©,/oD)  •  P)  ~  Pae/{^*  ^(©,rjO/oZ))i  P)  ? 

then  s  is  called  an  order-based  selection  operator.  □ 
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All  linkage-friendly  genetic  algorithms  (defined  in  Section  2.6)  use  order-based  selection  operators. 
Research  focuses  on  those  which  use  tournament  selection  (defined  in  Section  2. 6. 2. 3).  Other  examples  of 
order-based  selection  operators  include  ranking  selection,  (/i,  A)  selection,  and  (/i  +  A)  selection,  which  are 
discussed  by  Back  [6]. 

2.3.3  Algorithmic  Specification.  The  preceding  definitions  of  the  various  types  of  evolutionary 
operators  permit  the  following  formal  definition  of  an  evolutionary  algorithm,  due  essentially  to  Back  and 
Schwefel. 

Definition  2,3.13  (Evolutionary  algorithm):  Let 

•  I  be  a  non-empty  set  (the  individual  space). 

•  sequence  in  (the  parent  population  sizes), 

•  ®  sequence  in  Z"*"  (the  offspring  population  sizes). 

9^:1  — ^  ]R  a  fitness  function. 

•  ^  ^  {true, false}  (the  termination  criterion). 

•  X  ^  {true. false}, 

•  r  a  sequence  {r^^^}  of  recom.bination  operators 

rW  .  x(.*)  _  T 


•  m  a  sequence  {?7i^*^}  of  mutation  operators 


•  s  a  sequence  of  selection  operators 


s'*'  ;  X  T(J.R)  ^  r  ,/'*“*'*)) 

•  0r  ^  €  xt'*  (the  recombination  parameters^, 

•  el;’  €  xl;’  (the  mutation  parameters and 

•  ^3*^  €  (the  selection  parameters^. 

Then  the  algorithm,  shown  in  Figure  5  is  called  an  evolutionary  algorithm.  □ 


t  :=  0: 

initialize  P(0)  :=  {ai(0). . . .  ,a;,(0)}  € 
while  (t({P(0),...,P(t)})  ^  true)  do 
recombine:  P'{t)  :=  (P(t)); 

mutate:  P"(t)  :=  m^\\)(P'(t)); 
select:  if  x 

then  P(t+l):=s;;i„^^(P"(t)): 
eke  P(t  +  1):=  s|;’,,’ ^^(P"(t)  U  P(t)): 
fi 

t  :=  t  +  1: 
od 


Figure  5.  Outline  of  an  Evolutionary  Algorithm 

This  definition  differs  from  Back  and  SchwefeFs  in  several  ways.  First,  and  most  importantly,  the 
population  sizes,  operators,  and  parameters  are  all  represented  as  sequences,  reflecting  the  fact  that  certain 
evolutionary  algorithms  use  varying  population  sizes,  use  multiple  phases  of  execution  in  which  different 
operators  are  applied,  and  vary  their  parameters  over  the  course  of  execution.  In  particular,  some  linkage- 
friendly  genetic  algorithms  (defined  in  Section  2.6)  exhibit  these  characteristics. 

Another  difference  between  the  definitions  is  that  in  Figure  5,  the  termination  condition  t  depends 
on  the  set  of  populations  {P(0). . . . ,  P(t)}.  Many  evolutionary  algorithms  terminate  after  a  fixed  num¬ 
ber  of  generations  (corresponding  to  a  termination  criterion  satisfying  i({P(0), . . . ,  P(t)})  =  true 
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caid({P(0) . P(i)})>«/).  or  based  on  conditions  involving  populations  previous  to  the  current  genera¬ 

tion.  Both  definitions  fail  to  include  evolutionary  algorithms  which  terminate  based  on  conditions  involving 
the  number  of  function  evaluations  performed. 

Two  further  differences  are  notational.  The  variable  x  is  introduced  to  preserve  Back  and  Schwefel’s 
explicit  representation  of  selection  operators  which  act  on  populations  of  size  /i'  -t-  /t,  as  well  as  those  which 
act  on  populations  of  size  /t'.  In  Back  and  Schwefel’s  definition,  selection  acts  on  the  population  P''(t)  U  Q, 
where  Q  6  {•{}.P(t)}. 

Finally,  the  fitness  function  is  represented  as  a  parameter  of  the  selection  operator.  Consequently, 
explicit  statement  of  the  evaluation  step  is  unnecessary. 

To  summarize,  the  concepts  developed  in  this  section  include  population  transformations,  random 
population  transformations,  and  general  evolutionary  operators,  as  well  as  recombination,  mutation,  and 
selection  operators.  The  development  results  in  a  general  yet  precise  formal  framework  for  the  class  of 
evolutionary  algorithms.  In  later  sections,  specific  algorithms  are  defined  in  the  context  of  this  framework. 

2-4  Simple  Genetic  Algorithms 

Much  of  the  genetic  algorithms  literature  relates  to  the  simple  genetic  algorithm  (sGA).  defined  by 
Goldberg  [29]  based  on  Holland’s  seminal  work  [43].  or  slight  variations  thereof.  This  section  defines  the  sGA 
in  the  framework  of  evolutionary  algorithms  established  in  Section  2.3.  Section  2.4.1  discusses  the  fixed- 
length  binary  string  representation  used  by  the  sGA.  The  next  section  discusses  the  evolutionary  operators 
of  the  sGA:  single-point  crossover  (Section  2.4.2.1).  point  mutation  (Section  2.4.2.2),  and  stochastic  selection 
with  replacement  (Section  2.4.2.3).  Finally,  Section  2.4.3  presents  the  specification  of  the  sGA. 

2.4-1  Representation.  Let  ,4  be  a  non-empty  set  (the  genic  alphabet),  t  £  Z'^  (the  string  length). 
and  C  =  {1,...,^}  (the  loci).  Then  the  individual  space  is  J  =  A^,  and  an  individual  is  a  finite  sequence 
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a  =  (rti - ,ai)  e  I.  Individuals  are  sometimes  referred  to  as  chromosom-es* .  Each  ttj  6  ^  is  an  allele,  each 

*  €  £  is  a  locus,  and  each  ordered  pair  (ai,i)  is  a  gene.  For  many  genetic  algorithms  .4  =  {0,1}.  in  which 
case  the  individuals  are  described  as  binary,  and  the  algorithm  is  called  binary-coded.  The  simple  genetic 
algorithm  is  binary-coded. 

2.4.2  Genetic  Operators.  This  section  discusses  the  evolutionary  operators  used  by  the  simple 
genetic  algorithm.  The  single-point  crossover  operator  is  defined  in  Section  2.4. 2.1,  and  Section  2.4.2. 2 
defines  the  point  mutation  operator.  Finally,  roulette  wheel  selection  is  defined  in  Section  2.4. 2.3. 

2.4.2. 1  Recom-bination.  The  recombination  operators  used  in  genetic  algorithms  are  called 
crossover  operators.  They  are  traditionally  viewed  by  genetic  algorithm  researchers  as  the  primary  mecha¬ 
nism  by  which  new  solutions  are  introduced  to  the  genetic  algorithm  search  process.  Numerous  crossover  op¬ 
erators  in  use.  including  single-point  crossover,  two-point  crossover,  multi-point  crossover,  uniform  crossover, 
and  a  host  of  domain  specific  crossover  operators,  especially  in  the  context  of  combinatoric  optimization 
problems. 

The  simple  genetic  algorithm  uses  single-point  crossover,  which  is  parameterized  by  the  probability 
of  crossover  p^.  The  individuals  in  the  parent  population  are  randomly  paired,  and  a  crossover  point  is 
randomly  chosen  for  each  pair.  Those  portions  of  the  parent  individuals  following  the  crossover  points  are 
exchanged  with  probability  pc  to  form  pairs  of  offspring  individuals.  In  the  following  definition,  and  the 
sequel,  the  set  of  permutations  on  (1, . . .  ,ti}  is  denoted  TTn. 

Definition  2.4.1  (Single-point  crossover  operator):  Let  A  be  a  non-empty  set  (the  genic  alphabet;. 

(the  individual  length/.  /  =  .4^  (the  individual  space/  p  =  p.'  eZ+  (the  population  size/  =  [f  J. 
fl  =  TT^  X  [0.1]"  X  -  1}".  u  =  (cr.X.Y)  ~  U{n).  andr  '.R—^  T  {Q.T  an  evolutionary 

operator.  If  for  every  pc  €  [0. 1]  (the  probability  of  crossover/  P  G  /f*.  i  €  {1. . . . .  u}.  and  j  G  (1, . . .  ,^}.  r 

*A  word  on  terminology  is  in  order.  In  much  of  the  GA  literature,  terms  from  evolution  theory,  biology,  and  genetics  are 
used  (abused?)  freely  to  refer  to  the  computational  concepts  which  they  inspire.  This  tradition  encourages  anthropomorphizing 
the  algorithms,  but  in  the  interest  of  remaining  consistent  with  the  literature  this  research  adopts  the  standard  terminology. 
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satisfies 


and 


i[rpAP)U-ih  =  I 
i[rpAP)hi)j  =  I 


[rpAP)]p 


[■Pcrf2i-l)]j 

,  if  Xi>  Pc  or  j  <  Yi 

f  «/  ^  Pc  o,nd  j  >  Yi 

[^cr(2i)]j 

,  if  Xi  >  Pc  or  j  <  Yi 

[P(7{2i~l)]j 

?  Xi  <  Pc  and  j  >  Yi 

,  if  p  is  odd  , 

then  r  is  called  a  single-point  crossover  operator.  □ 

Single-point  crossover  is  illustrated  in  Figure  6.  Note  that  single-point  crossover  is  restricted  to  parents  of 
equal  length. 


Parent  1 
Parent  2 

Offspring  1 
Offspring  2 

Figure  6.  A  single-point  crossover  operator  acting  on  a  parent  population  of  size  /i  =  2. 


Crossover  Point 


24-2.2  Mutation.  Mutation  is  viewed  as  a  •'background"  operator  by  many  genetic  algorithm 
researchers,  which  is  very  much  in  contrast  to  the  view  of  the  evolution  strategies  community.  The  simple 
genetic  algorithm  uses  a  mutation  operator  called  point  mutation,  which  is  parameterized  by  the  probability 
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of  mutation  pm-^  Each  allele  of  each  individual  in  the  population  is  mutated  with  independent  probability 
Pm- 

Definition  2.4.2  (Point  mutation  operator):  Let  A  be  a  non-empty  set  (the  genic  alphabet^,  f  e  Z+ 
(the  individual  length^.  I  =  (the  individual  spaced,  p,  =  p’  £  Z+  (the  population  size),  fi  =  [0, x 
CO  =  (X,Y)  -  U{Q).  and  m  :  R  T  (fi.  T  an  eiwlutionary  operator.  If  for  every 

Pm  6  [0,1]  (the  probability  of  mutation^,  P  €  J^'.  t  €  {1 . p'},  and  j  e  {1,...  in  satisfies 


(K.n{^)]<)i 


,  if  Xij  <  Pm  ^  o>nd 

(Pih 

,  if  Xij  >  pm 

then  in  is  called  a  point  mutation  operator. 


□ 


Point  mutation  is  illustrated  in  Figure  7. 


Parent: 

Offspring: 


Mutation  Point 


Figure  7.  A  point  mutation  operator  acting  on  a  population  containing  a  single  individual  (the  parent).  A 
single  mutation  point  is  depicted. 


2.4-2.S  Selection.  Holland's  original  description  of  the  genetic  algorithm,  on  which  the 
simple  genetic  algorithm  is  based,  specifies  that  each  individuaFs  selection  probability  is  proportional  to  its 
fitness.  Selection  operators  which  exhibit  this  characteristic  are  referred  to  as  fitness  proportionate.  Fitness 


"" There  is  some  ambiguity  regarding  the  term  "probability  of  mutation.'*  In  one  convention,  the  allele  of  the  parent  individual 
is  included  in  the  set  from  which  the  allele  of  the  offspring  individual  is  drawn.  For  this  convention,  a  probability  of  mutation 
Pm  =  1.0  specifies  random  search,  regardless  of  the  cardinality  of  the  genic  alphabet.  In  the  other  convention,  the  allele  of  the 
parent  individual  is  excluded.  For  this  convention  pm  =  specifies  random  search,  where  C  is  the  cardinality  of  the  genic 
alphabet  (which  is  assumed  to  be  finite,  else  the  point  is  moot).  For  convenience,  the  definition  adopted  in  this  research  follows 
the  former  convention.  This  is  in  contrast  to  the  convention  followed  in  the  specification  of  the  simple  genetic  algorithm  [29]. 
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proportionate  selection  operators  have  no  parameters  other  than  the  fitness  function,  and  require  that  each 
individual  have  positive  fitness. 

Definition  2.4.3  (Fitness  proportionate  selection  operator):  Let  s  £  £VOV{I. be  a 

selection  operator.  If  for  every  P  £  P,  and  fitness  function  $  :  /  — the  selection  probability  of  each 
ill  dividual  a  €  P  satisfies 


Pseli^’i  ,  P ) 


^(a) 

Er=i^(-p.) 


then  s  is  called  a  fitness  proportionate  selection  operator.  □ 

The  simple  genetic  algorithm  uses  a  fitness  proportionate  selection  operator  called  stochastic  sampling 
with  replacement  also  known  as  roulette  wheel  selection  [29].  The  latter  term  is  motivated  by  imagining  that 
each  individual  is  assigned  an  arc  on  the  perimeter  of  a  roulette  wheel,  the  length  of  which  is  proportional 
to  the  individual  s  fitness.  Members  of  the  offspring  population  are  selected  by  '•spinning  the  wheel.”  and 
including  in  the  offspring  population  a  copy  of  the  individual  within  whose  arc  the  roulette  ball  lands.  The 
spins  of  the  wheel  correspond  to  the  components  of  the  random  vector  w  in  the  following  definition. 

Definition  2.4.4  (Stochastic  selection  with  replacement  operator):  Let  il  =  [0,1]^.  u  ~  U{Cl). 
s  £  £VOV(I.n.T(I.R+),a)^  and 


a{k:s.^.P)  ^  nnn{j:Y:UPsei{Pr,s^.P)>u,}  . 

If  for  every  fitness  function  $  ;  /  — >  R-*-  and  every  population  P  £  I^^,  s  satisfies 


then  s  is  called  a  stochastic  selection  with  replacement  operator. 
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2,4^3  Algorithmic  Specification.  The  preceding  sections  describe  the  individual  space  and  each  of 
the  evolutionary  operators  of  a  simple  genetic  algorithm.  This  section  specifies  the  simple  genetic  algorithm 
in  the  formal  framework  developed  in  Section  2.3. 

Definition  2.4.5  (Simple  genetic  algorithm):  Let 

•  ^  6  (the  individual  length). 

•  /  =  {0.1}^  (the  individual  space). 

•  tf  E  (the  final  generation). 

•  fj,  :zz  i_i'  e  Z"*"  (the  population  size). 

•  $  :  /  — ^  IR  a  fitness  function. 

•  t  :  ^  {true .false}  (the  termination  criterion)  such  that 

i{{P{0), _ P{t)})  =  true  card  ({P(0) . P{t)})  >  • 

•  r  €  ^VOP(/, /X.  R,  f2r)  a  single-point  crossover  operator. 

•  m  G  SVOV{Lfi.l,Q.m)  ®  point  mutation  operator. 

•  s  :  T(/,IR"^)  — ^  T  [1^,1^))  a  stochastic  selection  with  replacement  operator,  and 

•  ©r-  ©m  ^ 

Then  the  algorithm  shown  in  Figure  8  is  called  a  simple  genetic  algorithm.  □ 

Although  the  simple  genetic  algorithm  is  in  widespread  use  as  an  optimum  seeking  technique,  it  suffers 
from  at  least  two  significant  disadvantages  in  this  application  compared  to  other  evolutionary  algorithms. 
One  drawback  is  that  its  effectiveness  with  respect  to  a  given  application  depends  on  the  decoding  function. 
In  particular,  the  effectiveness  typically  depends  on  the  ‘‘orderi’  in  which  the  genes  are  mapped  to  the  object 
variables  of  the  objective  function.  Because  the  order  is  fixed,  the  simple  genetic  algorithm  possesses  no 
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t  :=  0: 

initialize  P(0)  :=  {ai(0) . ^^^(O)}  €  P^; 

while  (i({P(0),..,,P(^)})  ^  true)  do 
recombine:  P^{t)  :=  r©^(P{t))[ 
mutate:  P”{t)  :=  meJ(P^[t)); 
select:  P(t+  1)  :=  s^{P"(t)): 
t  :=  i  -f“  1' 
od 

Figure  8.  Outline  of  a  Simple  Genetic  Algorithm 

mechanism  by  which  to  detect  'linkage'’  between  strongly  interacting  genes  and  adapt  the  representation 
scheme  accordingly. 

Another  limitation  of  the  simple  genetic  algorithm  is  that  its  effectiveness  also  depends  on  the  fitness 
scaling  function.  This  dependence  is  directly  attributable  to  the  use  of  fitness  proportionate  selection.  This 
relationship  is  addressed  in  more  detail  in  Section  2.5,  where  it  is  also  shown  that  algorithms  using  order- 
based  selection  operators  do  not  share  this  disadvantage. 

2.5  Invariance  Properties  of  Selection  Operators 

The  set  of  evolutionary  operators  employed  by  an  evolutionary  algorithm  determines  the  effectiveness 
of  the  algorithm  for  a  given  application.  This  section  identifies  several  properties  which  characterize  certain 
selection  operators,  and  which  relate  to  the  effects  those  operators  have  on  an  algorithm's  effectiveness  and 
efficiency.  Most  importantly,  the  class  of  strictly  invariant  selection  operators  is  defined  and  shown  to  be 
equivalent  to  the  class  of  order-based  selection  operators.  All  linkage-friendly  genetic  algorithms  (defined  in 
Section  2.6)  use  selection  operators  of  this  type. 

If  two  functions  /  :  ^  R  and  /  :  ^  R  are  related  by  /(♦)  =  af{-)-\-b  where  a,  6  G  M  and  a  >  0, 

then  /  and  /  share  the  same  (local  and  global)  maxima  and  minima,  as  well  as  other  important  properties. 
Intuitively,  a  desirable  characteristic  for  an  optimum  seeking  technique  is  that  it  be  equally  effective  with 
respect  to  such  functions.  This  characteristic  is  closely  related  to  the  selection  operator  properties  of  scale 
invariance  and  translation  invariance.  These  properties  are  defined  by  de  la  Maza  and  Tidor  [15],  although 
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their  implications  for  commonly  used  genetic  algorithm  selection  operators  are  well  understood  in  earlier 
studies  (sec  Grefenstette  and  Baker  [40],  for  example). 

The  following  definitions  are  equivalent  to  those  proposed  by  de  la  Maza  and  Tidor.  A  selection 
operator  is  scale  invariant  if  the  selection  probabilities  which  it  assigns  are  preserved  when  the  fitness  function 
is  multiplied  by  a  positive  scalar. 

Definition  2.5.1  (Scale  invariant  selection  operator):  Let  s  €  SVOV(L^,X  x  be  a 

selection  operator.  If  for  every  0  €  X,  every  fitness  function  $  :  7  — >  R.  every  population  P  e  7^,  every 
individual  a  €  P.  and  every  c  £ 


—  Pseli^' P)  •> 


then  s  is  called  a  scale  invariant  selection  operator.  □ 

All  selection  operators  in  common  use  are  scale  invariant.  In  contrast,  some  commonly  used  selection 
operators,  including  all  fitness  proportionate  operators,  are  not  translation  invariant.  A  selection  operator  is 
translation  invariant  if  the  selection  probabilities  which  it  assigns  are  preserved  when  a  constant  (function) 
is  added  to  the  fitness  function. 

Definition  2.5.2  (Translation  invariant  selection  operator):  Let  s  G  EVOV[Lii.X  x  T(7,]R).l^)  be 
a  selection  operator,  and  u  :  7  — R  such  that  'u(a)  =  1  for  every  a  E  7.  If  for  every  0  G  X.  every  fitness 
function  $  :  7  — ^  R,  every  population  P  G  7^,  every  individual  a  G  P.  and  every  c  G  R 

73ei(^;  -p)  =  -P)  * 


then  s  is  called  a  translation  invariant  selection  operator. 


□ 
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The  use  of  selection  operators  which  are  not  translation  invariant,  including  those  which  are  fitness  propor¬ 
tionate.  leads  some  researchers  to  develop  a  large  body  of  empirical  knowledge  regarding  appropriate  fitness 
scaling  functions  for  various  applications  (see  Michalewicz  [57],  for  example). 

In  contrast,  this  research  is  primarily  concerned  with  selection  operators  which  are  translation  invariant. 
More  specifically,  it  is  concerned  with  the  class  of  selection  operators  which  are  invariant  under  every  strictly 
increasing  transformation. 

Definition  2.5,3  (Strictly  invariant  selection  operator):  Let  s  €  SVOV(I.  x  T(/, be  a 
selection  operator.  If  for  every  0  €  X.  every  fitness  function  $  :  I  — M,  every  population  P  6  P,  every 
individual  a  E  P.  and  every  strictly  increasing  function  g  :  R  — >  R 


Pseli^'  P)  “  -P)  • 


then  s  is  called  a  strictly  invariant  selection  operator.  □ 

Because  functions  of  the  form  /(x)  =  ax  +  6  are  strictly  increasing  when  a  >  0,  strict  invariance  implies 
both  scale  and  translation  invariance.  This  is  stated  formally  in  the  following  theorem: 

Theorem  2.5.4  Let  s  be  a  strictly  invariant  selection  operator.  Then  s  is  scale  invariant  and  translation 
invariant. 

Proof:  By  the  definition  of  a  selection  operator,  s  E  SVOV(Lfi,X  x  T(/.R),n)  for  some  non-empty  set 
p  El  set  X  (the  parameter  space),  and  set  12  (the  sample  space).  Let  0  €  X,  $  :  7  — R,  P  E  7^,  and 
a  6  P. 

Let  c  E  R"*”  and  define  g  :  R  — ^  R  by  g{x)  =  cx.  Then  g  is  strictly  increasing.  Because  s  is 
strictly  invariant,  Pseii^ii  =  Psei(a;  =  Paeii^'  5(©.c$)“P)*  Because  0,  P,  a.  and  c  are 

arbitrary,  s  is  scale  invariant. 
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Let  c  €  R  and  define  g:R  — »  R  by  g{x}  =  x  +  c,  and  u:  I  — »  R  such  that  u(a)  =  1  for  every  a  €  /. 
Then  g  is  strictly  increasing,  so  that  Psei{a:  P)  =  p,ei(a;S(0,go$),P)  =p«!(a:S(e,4.+ct,)--P)-  Because 

0,  P,  a,  and  c  are  arbitrary,  s  is  translation  invariant.  ■ 

The  following  theorem  states  that  strictly  invariant  selection  operators  necessarily  assign  selection 
probabilities  based  solely  on  the  (possibly  partial)  ordering  induced  on  the  individual  space  by  the  fitness 
function,  and  that  all  selection  operators  which  assign  selection  probabilities  in  such  a  manner  are  strictly 
invariant. 

Theorem  2.5.5  A  selection  operator  is  strictly  invariant  if  and  only  if  it  is  order-based. 

Proof:  Let  s  be  a  selection  operator.  Then  by  the  definition  of  a  selection  operator,  s  €  £VOV{L  x 
T(7.R),n)  for  some  non-empty  set  7,  /x  €  2+,  set  X  (the  parameter  space),  and  set  Q.  (the  sample  space). 

“If:  Suppose  s  is  an  order-based  selection  operator.  Let  Q  e  X.  ^  =  T,  o  f  o  D  :  I  — ►  R  a  fitness 
function,  P  e  7^,  a  €  P.  and  g  :  R  — >  R  strictly  increasing.  Define  /  =  T,  o  /.  Then  $  =  /  o  P.  Also, 
P  •  ^  *  R  and  /  :  R  *  R  for  some  n,  6  N.  Furthermore,  by  Lemma  2.3.4.  g  is  an  order-preserving 
fitness  scaling  function.  Thus,  by  the  definition  of  an  order-based  selection  operator,  P3e/(a:  S(e  P)  = 

Paej(®!  *(e./oD)‘ ~  Ps«i(^;  «(e,jo/oD)' P)  —  s^Q_gc,i).  P).  Because  0.  §.  P.  a.  and  g  are  arbitrary,  s 

is  strictly  invariant. 

"Only  if":  Suppose  that  s  is  strictly  invariant.  Let  0  e  X,  P  :  7  — >  R"  for  some  n  €  N.  /  :  R” _ >  R. 

T,  :  R  — >  R  an  order-preserving  fitness  scaling  function.  P  6  7^.  and  a  6  P.  Also,  let  p  :  R  — v  R  be 
the  identity  mapping,  and  define  $  =  p  o  /  o  P.  Then  /  oP  =  po/oP=$.  Also,  by  Lemma  2.3.4.  T, 
is  strictly  increasing.  Thus,  by  the  definition  of  a  strictly  invariant  selection  operator,  Psei(a-.  S(0,/o£>),  P)  = 
Pseiia:  S(0,$),  P)  =  Paei{a:  P)  =  Paelia:,  ®(©.T,o/oD).  P).  Because  0.  D.  f.  Ta.  P,  and  a  are  arbitrary. 

s  is  order-based.  ■ 
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In  light  of  this  theorem,  it  is  not  surprising  that  in  practice  algorithms  which  use  order-based  selection 
operators  rarely  use  (nontrivial)  fitness  scaling  functions.  This  section  concludes  with  the  observation  that 
order-based  selection  operators  are  necessarily  scale  and  translation  invariant. 

Corollary  2.5.6  Let  s  be  an  order-based  selection  operator.  Then  s  is  scale  invariant  and  translation 
invariant. 

Proof:  By  Theorem  2.5.5,  $  is  strictly  invariant.  By  Theorem  2.5.4,  s  is  scale  invariant  and  translation 
invariant.  ■ 

2.6  Linkage- Friendly  Genetic  Algorithms 

The  effectiveness  of  the  simple  genetic  algorithm  with  respect  to  a  given  application  depends  on  the 
specified  decoding  function.  In  particular,  the  effectiveness  depends  on  the  “order"  in  which  the  genes  are 
mapped  to  the  object  variables,®  The  effectiveness  also  depends  on  the  specified  fitness  scaling  function. 
These  dependencies  lead  researchers  to  consider  another  class  of  evolutionary  algorithms,  which  lack  these 
dependencies.  In  this  research,  these  algorithms  are  collectively  called  linkage -friendly  genetic  algorithms 
(IfGAs)? 

Historically,  the  dependence  of  the  simple  genetic  algorithm's  effectiveness  on  the  decoding  function 
motivated  Goldberg,  et  al.  [36]  to  propose  the  messy  genetic  algorithm  (mGA).  Later,  efficiency  considera¬ 
tions  motivated  the  development  of  the  fast  messy  genetic  algorithm  (fmGA)  [35].  More  recently.  Kargupta 
extended  the  fmGA  to  give  explicit  consideration  to  the  equivalence  class  competitions  conducted,  resulting 
in  the  gene  expression  messy  genetic  algorithm  (gemGA)  [45].  The  representation  scheme  shared  by  the  mGA 
and  fmGA  (Section  2.6.1),  as  well  as  the  representation  scheme  of  the  gemGA,  is  such  that  the  effectiveness 
of  each  algorithm  is  independent  of  the  “order"  in  which  genes  are  mapped  to  object  variables. 

®Tlus  fact  follows  immediately  from  Holland's  Schema  Theorem  [43]. 

'The  term  "linkage  friendly  genetic  algorithms'’  is  due  to  Goldberg  [30], 
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The  recombination,  mutation,  and  selection  operators  used  by  the  inGA  and  fmGA  are  discussed 

in  Section  2.6.2.  The  selection  operator  used  by  both  algorithms  (and  by  the  geniGA)  is  such  that  the 

effectiveness  of  each  is  independent  of  the  fitness  scaling  function.  The  general  evolutionary  algorithm 

framework  developed  in  Section  2.3  is  used  to  formally  specify  the  niGA  and  fmGA  (Sections  2.6.3  and 

2.6.4,  respectively).  The  section  concludes  with  a  review  of  existing  fmGA  parameter  selection  techniques 
(Section  2.6.5). 

2.6.1  Representation.  Linkage-friendly  genetic  algorithms  as  defined  in  this  research  share  a 
common  representation  scheme.  In  contrast  to  the  representation  scheme  used  in  simple  genetic  algorithms, 
loci  are  represented  explicitly  and  individuals  are  not  necessarily  of  uniform  length. 

Definition  2.6.1  (Linkage-friendly  genetic  algorithm  (IfGA)  individual  space):  Let  A  be  a  non¬ 
empty  set  (the  genic  alphabet;,  I  €  Z+  (the  nominal  string  length;.  £  =  {1,....^}  (the  loci;,  and  o  €  R 
such  that  o>l(  the  overflow  factor ),  Then 

[0‘1\ 

1=  y(Ax£)^~  U(A^x£^) 

A=0  A=0 

is  called  an  IfGA  individual  space  over  A 

□ 

Each  Oi  €  A  IS  an  allele,  each  6  £  is  a  locus  (plural  loci),  and  each  ordered  pair  (m.li)  is  a  yene  (c.f. 

Section  2.4.1).  Thus,  an  IfGA  individual -x.  £  I  may  be  viewed  as  a  vector  ((ai.fi) . (ua.Ia))  of  allele-locus 

pairs  for  some  A  €  {0 . [o  •  fj}  (the  string  length  or  individual  length).  Alternatively,  an  IfGA  individual 

may  be  viewed  as  an  ordered  pair  of  equal  length  vectors  x  =  (a.l)  =  ((ai,...  ,oa).(/i . h))  e  A^  x  C\ 

This  research  uses  the  two  views  interchangeably  as  convenient. 

Given  an  individual  (a.l)  =  ((«„  . . .  .aA),  (/, . /,)),  a  locus  L  may  occur  zero.  one.  or  more  times  in 

1.  This  implies  that  individuals  need  not  completely  specify  a  candidate  solution,  and  also  that  individuals 
may  overspecify  components  of  candidate  solutions.  In  non- overspecified  individuals  each  locus  occurs  no 
more  than  once  (i.e.  h  =  Ij  ^  i  =  j).  hence  such  individuals  have  lengths  A  e  {0. . . .  ,^}.  It  is  convenient 
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to  define  the  set  of  length  X  non-overspecified  individuals 


/(A)  -  {(rA)  =  {{ai — ,/a))  G  /  : /j  = z  =  j}  .  (2) 


An  individual  (aj)  is  fully  specified  if  each  locus  occurs  exactly  once,  i.e.  if  (Vz  G  C)(3\j  G  C)[lj  =  z].  The 
set  of  fully  specified  individuals  is  thus  Ip  =  I{^)-  There  is  of  course  a  "‘natural  decoding”  Tjr  :  Ip  — ►  A^' 
which,  given  a  fully  specified  individual,  produces  an  ^vector  of  alleles  representing  a  candidate  solution.® 
More  generally,  given  a  fully  specified  individual  c  €  Ip,  referred  to  as  a  competitive  template,  the  overlay 
mapping  associates  every  individual  x  G  /  (fully  specified  or  otherwise)  with  an  ^-vector  of  alleles. 

Definition  2,6,2  (Overlay  mapping):  Let  I  be  an  IfGA  individual  space  over  the  genic  alphabet  A  with 
nominal  string  length  t,  and  Ip  =  I{i)  defined  by  Equation  2.  The  mapping  T  :  I  x  Ip  — ^  A^  such  that  for 
each  z  G 


[r((a,l).(b.m))]i  ^ 


aj.  if  j  =  min{A:  :  //^  =  exists 

bj  where  mj  =  i.  if  V/^  :  ^  z 


is  called  the  overlay  mapping  for  I .  □ 

The  association  of  each  individual  x  G  /  with  a  vector  of  alleles  via  the  overlay  mapping  may  be 
thought  of  as  the  first  step  in  assigning  a  fitness  to  x.  Subsequent  steps  include  mapping  the  vector  of  alleles 
to  the  parameter  space  of  the  objective  function,  evaluation  of  the  objective  function,  and  possibly  fitness 
scaling.  The  composition  of  these  mappings  is  the  IfGA  fitness  function. 

Definition  2,6,3  (Linkage- friendly  genetic  algorithm  (IfGA)  fitness  function):  Let  I  he  an  IfGA 
individual  space  over  the  genic  alphabet  A  with  nominal  string  length  I,  Ip  =  I[l)  defined  by  Equation  2. 
T  \lxlp  — >  A^  the  overlay  mapping  for  I.  D  :  A^  (the  IfGA  decoding  function^.  /  :  E”  — (the 

objective  function/  TsiR  — ^R  (the  fitness  scaling  function/,  and  ^  =  Ts  o  f  o  D  oT  :  I  x  Ip  — >  M.  Then 

®In  particular,  define  Vp  :  Ip  — >  such  that  [r(a,  1)],  =  aj,  where  Ij  -  i. 
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${x,  c)  denotes  the  fitness  oi  x  £  I  with  respect  to  c  €  /f-  Furtherm.ore.  given  c  €  If  define  $c  '  I  — ►  R 
by  $c(*)  =  Then  called  an  IfGA  fitness  function  for  I.  □ 

Of  course,  an  IfGA  fitness  function  $c  niay  be  written  as  the  composition  Ts  o  f  o  Dc  :  I  — ►  M,  where 
Dc(')  =  D{T(-,  c)).  Thus,  IfGA  fitness  functions  are  fitness  functions  in  the  sense  of  Definition  2.3.2.  Finally, 
description  of  specific  linkage-friendly  genetic  algorithms  is  considerably  simplified  by  the  following  definition. 

Definition  2*6.4  ((Order-fc)  potential  building  block):  Let  Abe  a  non-ew,pty  set  (the  genic  alphabet^, 

t  e  (the  nominal  string  length/,  C  ^  {1 - (the  loci/  and  S  =  {(aj  Ji), . . . ,  (a^, /fc)}  ^  2"^^^  a  set 

of  genes.  If  the  loci  of  S  are  distinct,  i.e.  S  satisfies  i  =  j  /{  =  Ij.  then  it  is  called  an  order- potential 
building  block  or  sim'ply  a  potential  building  block.  □ 

2.6.2  Genetic  Operators.  This  section  discusses  the  recombination,  mutation,  and  selection  op¬ 
erators  used  by  the  messy  genetic  algorithm  (niGA)  and  fast  messy  genetic  algorithm  (fniGA).  Both  the 
niGA  and  the  fmGA  process  individuals  of  non-uniform  length,  and  consequently  require  a  more  general 
recombination  operator  than  single-point  crossover.  The  recombination  operator  proposed  by  Goldberg,  et 
al.  [36]  is  called  the  cut- and- splice  operator  (Section  2.6. 2.1). 

This  research  does  not  formally  define  the  niGA  mutation  operator,  which  is  analogous  to  the  point 
mutation  operator  of  the  simple  genetic  algorithm,  because  all  reported  mGA  experiments  use  a  zero  proba¬ 
bility  of  mutation.  In  contrast,  the  fmGA  uses  a  building  block  filtering  operator  (Section  2. 6. 2. 2)  which  this 
research  views  as  a  mutation  operator. 

Finally,  Section  2.6. 2.3  formally  defines  the  binary  tournament  selection  with  thresholding  operator, 
which  is  order-based  and  used  by  both  the  mGA  and  the  fmGA.  Because  it  is  order-based,  the  effectiveness 
of  each  algorithm  is  independent  of  the  fitness  scaling  function. 

2.6.2. 1  Recombination.  The  individual  spaces  of  linkage-friendly  genetic  algorithms  consist 
of  individuals  of  non-uniform  length  (see  Definition  2.6.1).  Thus,  the  single-point  crossover  operator  used 
in  the  simple  genetic  algorithm  is  not  directly  applicable  in  linkage-friendly  genetic  algorithms.  The  cut- 
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and-splice  operator  is  a  recombination  operator  which  processes  individuals  of  non-uniform  length  and  is 
otherwise  similar  to  single-point  crossover.  It  is  convenient  to  define  the  cut-and-splice  operator  in  terms  of 
the  composition  of  distinct  cut  and  splice  operators. 


A  cut  operator  maps  pairs  of  individuals  (the  parents)  to  4-tuples  of  individuals  (the  fragments).  For 
a  =  (ai, . . .  ,ax)  €  (4x  £)'^,  the  following  definition  denotes  by  ai-.j  the  fragment  (oj, . . .  ,a,  )  €  (Ax 
where  1  <  *  <  i  <  A.  Some  fragments  may  be  trivial,  i.e.  of  length  0;  these  are  denoted  {}.® 

Definition  2.6.5  (Cut  operator):  Let  I  he  an  IfGA  individual  space,  fi  =  [0. 1]‘‘.  w  =  (Xa,  Aj,.  Ya,  Ys)  ~ 
f/(fi),  and  K  :  R  — >  T(Cl.T(P.I*))  an  evolutionary  operator.  If  for  every  p^  €  [0.1]  (the  cut  probability^, 
every  (a,b)  €  (A  x  £)^«  x  (A  x  C)^>’  C  P  (the  parents/.  Ya  =  [(A,,  -  1)  •  Fq]  and  Yb  =  [(A6  -  1)  •  F,]  (the 
cut  points^,  K  satisfies 


Kp,(a.b) 


A 


(ai;y„,ay„  +  l:A.,.bi;y^.byi,+i:Aj 
(ai;y,.ay,-i-i:A„.b.  {}) 

<  (a,bi;y^.byj,+i;Ai.  {}) 


(a, b, {}.{}) 


,  if  AflAj,  >  0,  Xa  <  Pc,  and  Xb  <  Pc 
•  if  Aa  >  0,  Xa  <  Pc.  and 
either  Aj  =  0  or  Xj  >  pc 
,  if  Aj  >  0,  Xb  <  Pc,  and 
either  Xa  =  0  or  Xa>  Pc 
,  if  either  \a  =  Q  or  Xa>  Pc,  and 
either  Aj  =  0  or  Xj  >  pc 


then  K  is  called  a  cut  operator.  □ 

A  splice  operator  maps  4-tuples  of  individuals  (the  fragments)  to  n-tuples  of  individuals  (the  offspring). 

where  n  £  {2,3,4}.  In  the  following  definition,  if  a  =  (oj . oaJ  €  (Ax  £)^»  and  b  =  (6i....,6aJ  € 

( A  X  £)''‘  are  fragments,  then  the  offspring  (aj . . . . ,  oa,  .  6i , . . . ,  6a  J  is  denoted  ab. 


®This  notation  is  consistent  with  the  view  of  an  IfGA  individual  x  €  (.4  x  £)"  as  equivalent  to  the  sequence  of  n  allele-locus 
pairs  which  it  implicitly  defines.  By  the  definition  of  a  sequence  (see  Apostol  [3]).  x  is  then  a  function,  i.e.  a  set  of  ordered 
pairs.  {(1.  ii ), . . . ,  (n,  x„ )}.  where  each  x;  6  «4  x  £  is  an  allele-locus  pair.  Suppose  n  =  0,  Then,  x  is  the  empty  set  of  ordered 
pairs. 
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Definition  2.6.6  (Splice  operator):  Le.t  I  be  an  IfGA  individual  space.  Q.  =  [0. 1]^,  w  =  (Xab- Xic,  Xcd)  ~ 
U(Q,).  and  ^  :  K  — »  T(fi.T(/^./^  UP  U/^))  an  evolutionary  operator.  If  for  every  €  [0.1]  (the  splice 
probability^,  and  every  (a.b.c.d)  e  I*  (the  fragments).  C  satisfies 


Cp.(a.b.c.d) 


(ab.cd) 

»  ^ab  ^  Ps  and  Xcd  ^  pg 

(ab,c,d) 

1  'If  Xq})  ^  Pg  and  Xq(1  ^  pg 

(a.  bc,d) 

'  ^f  X di)  ^  pg  and  Xffc  ^  pg 

(a.b.cd) 

»  Xdh  ^  Ps*  XifQ  ^  Ps*  and  X^d  ^  pg 

(a.b.c.d) 

,  if  Xah  >  Ps*  Xhc  >  Ps*  a,nd  Xcd  >  Ps 

then  ^  is  called  a  splice  operator. 


□ 


A  local  cut- and- splice  operator^^  is  an  evolutionary  operator  which  produces  population  transformations 
expressible  as  the  composition  of  the  population  transformations  resulting  from  a  cut  operator,  a  permutation 
of  the  resulting  fragments  (possibly  depending  on  the  parameters  and  random  events  of  the  cut  operator), 
and  a  splice  operator. 


Definition  2.6.7  (Local  cut-and-splice  operator):  Let  I  be  an  IfGA  individual  space,  =  [0.1]^  x 
[0,1]^,  w  =  ~  U(^i).  K  a  cut  operator,  a  :  R  x  [0,1]^  — >  7r4,  (  a  splice  operator,  and  r'  :  — >• 

T{Q,T(P,P  Up  Ul^))  an  evolutionary  operator.  If  r'  satisfies 


~  [Cpsl^^s)]  ^([''Pc(‘<^c)l(a5b))[<j(p^,(j^)](i), . . , ,  ([Kpja;c)](a.b))[,^(p^_„^)](4)^ 


then  r'  is  called  a  local  cut-and-splice  operator. 


^OWith  respect  to  both  recombination  and  mutation  operators.  Back  and  Schwefel  [7]  distinguish  between  -macro-operators^’ 
(equivalent  to  the  “population  transformations**  defined  in  this  research)  and  “local  operators,”  which  map  populations  to 
mdividuals.  Informally  speaking,  local  operators  capture  the  low-level,  essential  behavior  of  the  corresponding  macro-operators. 
Consequently,  specific  recombination  and  mutation  operators  are  often  defined  in  terms  of  local  operators. 

Strictly  speaking,  the  local  cut-and-splice  operator  defined  in  this  research  is  not  a  local  operator  in  the  sense  of  Back  and 
Schwefel,  because  it  produces  more  than  one  individual. 
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The  permutation  mapping  cr  in  Definition  2.6.7  is  arbitrary.  Different  mappings  correspond  to  different  local 
cut-and-splice  operators  and  result  in  different  sets  of  potential  offspring.  Goldberg,  et  al.  [36]  propose  a  local 
cut-and-splice  operator,  for  which  the  potential  sets  of  nontrivial  offspring^^  are  illustrated  in  Figure  9.  The 


Figure  9.  Potential  Nontrivial  Offspring  Resulting  From  Goldberg’s  Local  Cut-and-splice  Operator 

permutation  mapping  of  Goldberg’s  local  cut-and-splice  operator  is  intended  to  closely  resemble  the  behavior 
of  single-point  crossover  for  individuals  of  length  close  to  the  nominal  string  length. 

Definition  2.6.8  (Goldberg’s  local  cut-and-splice  operator):  Define  a  :  R  x  [0, 1]^  — >  7r4  by 

f 

(1, 4. 3, 2)  ,  if  Xa  <  Pc  and  <  Pc 

a(pc,wc)  =  I  (1.2, 3.4)  ,ifXa>Pc  andXi>pc  • 

(1,3, 2,4)  ,  otherwise 

Let  I.  O.  a;,  k,  and  r'  be  as  in  Definition  2.6.7.  Then  r'  is  called  Goldberg’s  local  cut-and-splice  operator. 

□ 

practice,  only  nontrivial  individuals  are  included  in  the  offspring  population. 
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A  cut- and- splice  operator  is  an  evolutionary  operator  which  extends  a  local  cut-and-splice  operator 
to  operate  on  populations  of  arbitrary  size  (i.e.  a  macro-operator  corresponding  to  a  local  cut-and-splice 
operator).  In  contrast  to  the  situation  with  single-point  crossover,  for  which  every  pair  of  parents  results  in 
exactly  two  offspring,  a  local  cut-and-splice  operator  probabilistically  results  in  between  1  and  4  offspring 
for  each  pair  of  parents.  Because  of  this  uncertainty,  it  is  convenient  to  recursively  define  the  population 
produced  by  a  cut-and-splice  operator. 

Definition  2.6.9  (Cut-and-splice  operator):  Let  I  be  an  IfGA  individual  space,  /x  6  Z+  (the  parent 
population  size/,  /i'  €  Z+  (the  offspring  population  size/  ^  x  ([0,1]^  x  [0,1]^)'"'.  uj  = 

((‘’"i . . ~  U[Q.),  r'  a  local  cut-and-splice  operator,  r  6  SVOV(I,p,,R‘^,Q).  and 


f{P':io,i,j.k:P,pc,p3,u)  =  < 


P' 

f{P':io.iQ.j  -  l.kiP.pc.ps.oj) 

f( 

“  1.,  k 

P,Pc,Ps,U> 

P'U{Qu...,Qk} 

P'^iQl . Qdimq}; 

^0,^  -  dim  Q.j.k  -  dimQ; 

P,PC,P3^UJ 


,  if  k  =  0 

.  if  k  >  0  and  i  =  0 


.  if  k  >  0  and  i  =  1 
.  if  0  <  k  <  A  and  i  >  1 


,  if  k  >  Q  and  i  >  1 


(3) 


where  Q  (Qi, - Qaimg)  denotes  the  offspring  of  an  invocation  of  P . 
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If  for  every  pc  €  [0. 1]  (the  cut  probability;,  every  p,  £  [0, 1]  (the  splice  probability;,  and  every  P  £ 
(the  parent  population/  r  satisfies 

then  r  is  called  a  cut-and-splice  operator.  If  r'  is  Goldberg’s  local  cut- and- splice  operator,  then  r  is  called 
Goldberg’s  cut-and-splice  operator.  □ 

2. 6. 2. 2  Mutation.  The  fast  messy  genetic  algorithm  uses  a  building  block  filtering  (BBF) 
operator,  which  this  research  views  as  a  mutation  operator.  The  resulting  population  transformations  map 
parent  populations  P  £  7(Ao)  to  offspring  populations  P'  £  /(A/)  where  A/  <  Ao-  The  mapping  “deletes” 
Ao  -  Xf  randomly  chosen  genes  from  each  individual  in  P.  The  genes  to  be  deleted  from  each  individual  are 
chosen  uniformly  without  replacement.  Equivalently,  the  genes  to  be  retained  are  chosen  uniformly  without 
replacement.  It  is  convenient  to  define  the  BBF  operator  in  terms  of  the  local  BBF  operator. 

Definition  2.6.10  (Local  building  block  filtering  operator):  Let  I  be  an  IfGA  individual  space  over 
genic  alphabet  A  with,  nominal  string  length  t, 

=  |(T  e  T  ^{0, . . .  ,f}.  1^  TTi j  :  a(i)  £  -Wi 

w  ~  U{9.).  and  m'  £  1,  {0, . . .  ,7},  fi)  an  evolutionary  operator.  If  for  every  A/  £  {0, _ i}  (the 

offspring  individual  length;,  and  every  a  =  ((ai.fi),...  .{aA„.fAo))  €  I.  m'  satisfies 

=  ((®[w(A^)](l),^[a-(A/)](l)) . (“KA/)](A/)-^[i,.(A^)](Ay)))  , 

then  m  is  called  a  generalized  local  building  block  filtering  operator.  □ 


A  local  BBF  operator  is  illustrated  in  Figure  10. 


Figure  10.  Local  Building  Block  Filtering  Operator 


Definition  2.6.11  (Building  block  filtering  operator):  Let  I  be  an  IfGA  individual  space  with 
nominal  string  length  i.  fi  =  /i'  €  Z+  (the  population  size/.  Ao  €  (the  parent  individual  length/ 

7(Ao)  defined  by  Equation  2,  Q  =  ([0,1]^°)'^',  w  =  (wi. . . .  ~  U(Q),  m'  a  local  BBF  operator,  and 

7n  :  N  ►  .(/(A/))^  ))  an  evolutionary  operator.  If  for  every  A/  6  {0,...,Ao}  (the  offspring 

individual  length/  every  P  £  (7(Ao))^  (the  parent  population),  and  every  z  e  {1. . . .  ,/x'}.  m  satisfies 


then  m  is  called  a  building  block  filtering  operator.  q 

Because  the  offspring  individual  lengths  A/  are  deterministic  (and  identical  for  all  individuals  in  the  offspring 
population),  this  research  sometimes  refers  to  building  block  filtering  operators  as  deterministic  building  block 
filtering  operators.  This  is  in  contrast  to  probabilistic  building  block  filtering  operators,  which  are  defined  in 
Chapter  III. 


2.6. 2.3  Selection.  Both  the  messy  genetic  algorithm  and  the  fast  messy  genetic  algorithm 
use  a  selection  operator  called  tournament  selection.  In  its  most  general  form,  tournament  selection  can  be 
described  as  follows: 
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1.  Randomly  draw  q  com,peting  individuals  from  the  current  population. 

2.  Rank  the  competing  individuals  according  to  fitness, 

3.  Randomly  draw  one  of  the  individuals  (the  winner)  and  include  it  in  the  next  population. 

By  far  the  most  frequently  encountered  form  of  tournament  selection  is  binary  tournament  selection  (BTS), 
for  which  g  =  2. 

Definition  2.6.12  (Binary  tournament  selection  operator):  Let  I  he  a  non-empty  set  (the  individual 
space^.  (1  G  (the  parent  population  size/,  //  G  (the  offspring  population  size/  12  =  ({1, _ 

^  =  ((a;o(l),cJi{l)) . and  s  G  EVOV(L If  for  every  fitness 

function  ^  :  I  — >  R  and  every  population  P  e  I^.  s  satisfies 

[s$(F)Ji  = 

then  s  is  called  a  binary  tournament  selection  operator.  □ 

Many  variations  of  tournament  selection  are  in  common  use.  Some  variations  differ  in  the  method  by 
which  the  competing  individuals  are  drawn  from  the  population.  If  they  are  drawn  without  replacement, 
then  they  are  typically  drawn  from  a  single  “copy"  of  the  population. 

Other  variations  differ  in  the  method  by  which  the  winner  is  drawn  from  the  competing  individuals. 
Typically,  the  winner  is  the  most  fit  of  the  competing  individuals  (in  which  case  implementation  of  the 
ranking  step  is  unnecessary).  Variations  in  which  the  winner  is  chosen  according  to  some  (non- trivial) 
probability  density  function  defined  on  the  rankings  are  called  probabilistic  tournament  selection  [33]. 

Finally,  some  variations  use  thresholding,  which  restricts  the  choice  of  competing  individuals  to  those 
which  are  compatible  with  each  other  [36].  Individuals  are  considered  compatible  if  they  are  sufficiently 
similar. 
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Definition  2.6.13  (Individual  similarity,  ^-compatible):  Let  I  be  a  non-eTm,'pty  set  (the  individual 
space/  Then  a  mapping  d  :  P  — >  N  is  called  an  individual  similarity.  Let  a.b  €  /  and  0  :  ■ — >  N  (the 
threshold  mapping/  If  d(a.h)  >  ^(a.b)  then  a  and  b  are  ^-compatible.  □ 

For  efficiency,  implementations  of  BTS  typically  consider  a  maximum  of  £  Z"*"  (the  shuffle  size)  individ¬ 
uals  in  seeking  a  compatible  second  individual. 

Definition  2.6.14  (Binary  tournament  selection  with  thresholding  operator  and  finite  shuffle 
size  rish)*  Let  I  be  a  non-empty  set  (the  individual  space/.  I  £  Z"*”,  p  £lj'^  (the  parent  population  size/ 
£  Z"^  (the  offspring  population  size/,  ngh  G  Z"^  (the  shuffle  size/  Q.  =  ({1 . 


u>  =  ((0-0(1), . . . , Wn.Jl)), . . . ,  (a;o(M') . ifi'))  ~  U{n)  , 

d  an  individual  similarity,  and  s  e  SVCPII.  fj,,T(P.  N)  x  T(/,  Also,  define  j  :  {1, _ /*'}  x 

T(/2.N)  — .{0,....n,4  by 


j{h^) 

If  for  every  6  :  P  - 
P  £  s  satisfies 


^  min{A; :  d(P„„(j),P„^(j))  >  .otherwise 

N  (the  threshold  mapping  j,  every  fitness  function  $  ;  /  — -  R.  and  every  population 


[s{»,i){P)]i  = 


?  otherwise 


then  s  is  called  a  binary  tournament  selection  with  thresholding  operator. 


Because  each  is  order-based,  BTS  and  BTS  with  thresholding  are  examples  of  strictly  invariant  selec¬ 
tion  operators  (see  Section  2.5).  Consequently,  for  evolutionary  algorithms  using  either  BTS  or  BTS  with 
thresholding,  effectiveness  is  unaffected  by  the  choice  of  (order-preserving)  fitness  scaling  function. 
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2.6.3  Messy  Genetic  Algorithms.  The  depeiideuce  of  the  simple  genetic  algorithm’s  effectiveness 
oil  the  '*order’‘  in  which  the  genes  are  mapped  to  the  object  variables  motivated  Goldberg,  et  al.  to  propose 
the  m.essy  genetic  algorithm,  (niGA)  [31,  32,  36].  The  mGA  uses  the  order-invariant  representation  scheme 
defined  in  Section  2.6.1,  as  well  as  the  strictly  invariant  BTS  with  thresholding  operator.  The  algorithm 
is  designed  to  obtain,  with  high  probability,  an  order-A;  optimal  individual  (i.e.  a  fully-specified  individual 
for  which  the  fitness  cannot  be  improved  by  changing  k  or  fewer  alleles),  given  an  order-(A:  -  1)  optimal 
competitive  template.  The  parameter  k  is  called  the  building  block  size. 


Goldberg,  et  al.  [31]  suggest  that  the  algorithm  be  applied  iteratively  for  1  <  A:  <  Ajmax^  using  the  best 
individual  found  in  iteration  A:  -  1  as  the  competitive  template  for  iteration  k.  They  also  suggest  [36]  that 
A^max  be  ‘'chosen  to  encompass  the  highest  order  deceptive  nonlinearity  suspected  in  the  subject  problem,” 
Such  an  estimate  is  typically  not  available.  This  author  suggests  that  Aj^ax  must  be  viewed  as  controlling  a 
tradeoff  between  expected  solution  quality  (effectiveness)  and  execution  time  (efficiency). 

The  mGA  consists  of  the  initialization^  prim.ordial.  and  juxtapositional  phases  (see  Figure  11).  In  the 
initialization  phase,  a  deterministic  technique  called  Partially  Enumerative  Initialization  (PEI)  produces  an 
initial  population  containing  at  least  one  ‘"copy”  of  each  order-A;  potential  building  block.  That  is,  for  each 
order-A;  potential  building  block  {(ai,  /i ). . . . ,  (a^.  h)}.  the  initial  population  contains  at  least  one  individual 
of  the  form  a  =  ((a^d), /^(d), - ^o-(JG)))^  where  <7  is  a  permutation  on  {1 . k}.  The  usual  initial 


^^Tlie  class  of  deceptive  functions  may  be  defined  as  follows  (without  loss  of  generality,  a  maximization  problem  is  assumed). 

Let  I  —  {0,1}^,  £  =  {1 - ,£}.  ^  ~  f  o  D  :  I  — >  R  such  that  there  exists  a  global  maximum  /(x)  of  /,  D  is  one-to-one.  and 

(3a  6  I)[D{k)  =  x].  Also,  for  each  schema  (plural  schemata)  h  =  {hi , . . .  .he)  G  {0,1.#}^  define  o(h)  =  card  {{i  G  C  :  hi  ^  #}) 
and 

5(h)  i  {a  =  (ai . a^)  €  /  :  (Vi  6  =  #  V  a.-  =  h,]}  . 

If  5(  h)  contains  a  and  the  individuals  in  5(h)  have  lower  average  fitness  than  the  individuals  in  5(h' )  for  each  of  the  “competing” 
schemata  h',  i.e.  if 


(a  e  5(h))  A  (a  ^  5(h'))  A  (fi;  =  #  .t=(.  h'  =  #)  => 

then  /  is  called  deceptive  with  respect  to  h.  The  order  of  deceptiveness  of  /  is  max{o(h)  :  /  is  deceptive  w.r.t.  h}.  and  /  is 
order-k  fully  deceptive  if  /  is  deceptive  with  respect  to  every  schema  h  such  that  a  6  tS{h)  and  o(h)  <  Jb. 

Grefenstette  [39]  shows  ‘Hhat  deception  is  neither  necessary  nor  sufficient  to  make  a  problem  difficult  for  GAs."  This  result 
in  no  way  argues  against  the  use  of  either  order-invariant  representations  or  strictly  invariant  selection  operators. 


^a6-5(h) 


card  (5(h)) 


^(h') 


^(a) 


card  (5{h')) 


40 


Figure  11.  Messy  Genetic  Algorithm  Flow  Chart 


population  is 

P(0)  =  {((ai,/i),....(afc,4))  €  I(k)  :  <  •■•  <  /fc}  . 

For  a  nominal  string  length  i,  and  a  necessarily  finite  genic  alphabet  A.  the  initial  population  contains 

/i'"'  =  [card(^)]^^Q 

individuals.  Consequently,  for  the  usual  case  of  /  >  /c,  the  algorithmic  complexity  of  the  initialization  phase 
is  0{[l  •  card(^)]^).  which  is  also  the  complexity  of  the  overall  algorithm.  For  /c  >  3,  is  much  larger 
than  typical  simple  genetic  algorithm  population  sizes. 

The  primordial  phase  is  designed  to  transform  the  initial  population  into  a  population  of  individuals 
P(tp)  C  I{k)  which  can  be  processed  effectively  and  efficiently  in  the  juxtapositional  phase.  The  only  operator 
used  in  the  primordial  phase  is  binary  tournament  selection  with  thresholding,  with  periodically  decreasing 
offspring  population  sizes. The  individual  similarity  $  used  by  the  niGA  is  such  that,  for  each  pair  of 

Goldberg,  et  al.  suggest  that  the  competing  individuals  be  drawn  without  replacement  [36]. 


4: 


d(a,b)  =  card({7,26,8,18,2,3,25,31,10,l,21})  =  ll 

Figure  12.  Two  mGA  individuals  are  depicted  as  vectors  of  allele-locus  pairs.  The  individual  similarity  is 
the  number  of  loci  which  occur  at  least  once  in  each  individual. 

individuals  (a,  1)  =  ((oi. ....  )Ah-  ■  •  ■  '^x„))  €  •4''°  ^  (^-  ~  . )•  (m-i, . . . ,  mxA)  € 

^((a.l),  (b,m))  =  card ({ii, ...  Ja,.}  n  {mi - ,'mxA)  ■ 

This  is  illustrated  in  Figure  12.  The  number  of  common  defining  loci  of  individuals  a  ~  U(I{Xa))  and 
b  ~  U(I(Xb))  is  a  random  variable  X  with  the  hypergeometric  probability  density  function  h{--,  Xa-  Xi,-.  t). 
Individuals  of  length  Aa  and  Xb  are  considered  compatible  if  they  share  at  least  S  [X]  =  XaXb/l  common 
defining  loci. 

Finally,  the  juxtapositional  phase  uses  Goldberg  s  cut-and-splice  operator,  as  well  as  BTS  with  thresh¬ 
olding.  Cut  and  splice  probabilities  are  chosen  to  promote  rapid  increase  of  the  individual  length  from  k  to 
i  [36].  The  individual  similarity  and  threshold  mapping  are  the  same  as  those  used  in  the  primordial  phase. 

Definition  2.6.15  (Messy  genetic  algorithm):  Let 

•  I  be  an  IfGA  individual  space  over  the  genic  alphabet  {0.1}  with  nominal  string  length  t  and  overflow 
factor  o, 

•  I(X)  defined  by  Equation  2, 

•  fc  e  {1, . . .  ,0  building  block  size), 
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•  if  £Z'^  (the  final  generation 

•  tp  e  {0.. ..  ,tf}  (the  final  primordial  phase  generation^, 

•  C  Z'^  a  non-increasing  sequence  (the  parent  population  sizes 

•  for  t  €  {0 _ -  1}  (the  offspring  population  sizes^, 

•  c  ^  Ip  ^  I{1]  (the  competitive  template), 

•  ’  I  — ^  R  VGA  fitness  function, 

•  ^  ^  {true .false}  (the  termination  criterion)  such  that 

l{{P{0),  ....  P{t)})  =  true  <J==>  card  ({P(0). - P{t)})  >  tf  , 

•  r  a  sequence  of  Goldberg's  cut- and- splice  operators  r^^^  :  ►  T 

•  m  a  sequence  {?n^*^}  of  identity  evolutionary  operators, 

•  s  a  sequence  {5^^^}  of  BTS  with  thresholding  operators 

:  T(/^N)  X  T{IM}  T  , 

•  Pc*'  =  Ps*^  =  0  for  0  <  t  <  tp. 

•  0}^  =  G  for  0  <  t  <  tf  (the  cut-and-splice  parameters/  and 

•  the  threshold  mapping  $  :  P  — N  defined  such  that  for  a  €  (^  x  and  b  G  (-4  x 

<9(a,b)=[^]  . 

Then  the  algorithm  shown  in  Figure  IS  is  called  a  messy  genetic  algorithm. 
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t  ■.=  0: 

initialize  F(0)  :=  {ai(0) . a^(oi(0)}  =  {a  =  ((ai,/i). . . . , (a^, 4))  e  I(k)  :  h  <  ■  <  4}: 

while  {t,({P(0),...,P(t)})  true)  do 
recombine:  P'{t)  :=  rQ(()(P(t)); 
mutate:  P''{t)  :=  m(P'(t)); 
select:  P(t  +  1)  :=  S(#  j(P"(t)): 
t:=t+h 
od 


Figure  13.  Outline  of  a  Messy  Genetic  Algorithm 

2.6.4  Fast  Messy  Genetic  Algorithms.  The  large  initial  population  size  of  the  messy  genetic 
algorithm  and  the  corresponding  algorithmic  complexity  motivated  Goldberg,  et  al.  [35]  to  propose  the  fast 
messy  genetic  algorithm  (fmGA),  which  is  illustrated  in  Figure  14.  The  initial  population  of  the  fniGA  is 
constructed  using  a  technique  called  Probabilistically  Com.plete  Initialization  (PCI),  which  randomly  samples 
individuals  from  /(P),  where  I'  =  I  —  k.  The  population  size  is  chosen  according  to  the  population  sizing 
relation  of  Goldberg,  et  al.  [34],  so  that  each  order-fc  potential  building  block  receives  an  expected  number 
of  "copies”  sufficient  to  overcome  sampling  noise  with  specified  probability.^^ 

The  goal  of  the  fmGA  primordial  phase  is  the  same  as  that  of  the  niGA  primordial  phase,  i.e.  to  obtain 
a  population  of  individuals  P(tp)  C  I{k).  some  of  which  can  with  high  probability  be  juxtaposed  to  obtain 
an  order-/:  optimal  individual.  Because  the  initial  population  consists  of  individuals  P(0)  C  7(P).  building 
block  filtering  (BBF)  is  used  to  periodically  reduce  the  lengths  of  the  individuals  (it  is  assumed  that  I'  >  k). 

Definition  2.6.16  (Fast  messy  genetic  algorithm):  Let 

•  I  be  an  IfGA  individual  space  over  the  genic  alphabet  {0,1}  with  nominal  string  length  t  and  overflow 

factor  o, 

•  /(A)  defined  by  Equation  2, 

•  k  £  {1, ...  ,7}  (the  building  block  size  /, 

this  context,  a  ‘‘copy”  of  the  order-/:  potential  building  block  {(ai , (at, /|t)}  is  an  individual  of  the  form 
^  A<^<Tik)J(r{k)))y  where  cr  is  a  permutation  on  {1,,,.,^^}, 
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Figure  14.  Fast  Messy  Genetic  Algorithm  Flow  Chart 


•  tf  £  (the  final  generation 

•  tp  €  {0, . . . ,  t/}  (the  final  primordial  phase  generation^), 

•  =  I  —  k  (the  initial  individual  length/. 

•  miuc{k . i^k}  a  non-increasing  sequence  (the  individual  lengths/, 

•  a  G  [0, 1]  (the  probability  of  selection  error/, 

•  Zft  E  R  such  that  Z  ~  ^^(0. 1)  =>  Pt[Z  >  Zct]  =  I  —  a. 

•  (the  maxim/um  inverse  signal-to-noise  ratio  per  sub-function  to  he  detected), 

•  jx  —  fi*  ^  ( [ |]  “  1)2^  (the  population  size/, 

•  c  £  Ip  =  1(1)  (the  competitive  template), 


•  •  -f  — ^  ^  un  IfGA  fitness  function, 


•  t  :  ^  {true, false}  (the  termination  criterion)  ftuch  that 

t({P(0),...,P(«)})  =  true  card({P(0) . P(t)})  >  tf  , 

•  r  a  sequence  (r**'}  of  Goldberg’s  cut- and- splice  operators  r'*'  :  — >  T  (fir ’,T  (/''*  ’,P' 

•  771  a  sequence  of  evolutionary  operators, 

•  for  0<t<  tp.  m(‘>  :  N  — >  r  (fi*^'.  T  a  BBF  operator, 

•  for  tp  <  t  <tf.  777**’  an  identity  evolutionary  operator. 

•  s  a  sequence  {s**’}  of  BTS  with  thresholding  operators. 

5**’  ;  T(/2.N)  X  T(/.R)  T  (fi^*>,T  (/''“',/**'''’))  , 

•  0*^’  =  A**’  /or  0  <  i  <  tp  (the  filtering  parameters}, 

•  =  pi*’  =  0  /or  0  <  t  <  tp, 

•  0*.*’  =  6  R^  /or  0  <  t  <  t/  {t/ie  cut-and-splice  parameters},  and 

•  9a  sequence  {^**’}  of  threshold  mappings  0**’  :  P  — ►  N. 

Then  the  algorithm  shown  in  Figure  15  is  called  a  fast  messy  genetic  algorithm. 

t-.=  0; 

initialize  P(0)  :=  {ai(0),...  ,a^(O)(0)}  ~  Cf(/(A**”)); 

while  (7({P(0) . P{t)})  ^  true)  do 

recombine:  P'{t)  ■=  VQ^t){P{t))•, 
mutate:  P"{t)  :=  mQtt)(P'(t))-, 
select:  P(t4- 1)  :=  S(fl,$.,)(P”(i)); 
t  1=  ^  1; 

od 

Figure  15.  Outline  of  a  Fast  Messy  Genetic  Algorithm 
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The  hope  IS  that  a  balance  can  be  found  between  the  the  explorative  effect  of  filtering  and  exploitative 
effect  of  selection.  This  balance  depends  on  the  specification  of  the  secpiences  of  individual  lengths  {A") }  and 
threshold  mappings  Goldberg,  et  al.  suggest  the  more  conservative  threshold  mappings  :  P  N 

such  that  for  (a.b)  g(Ax  x  x  C]^\  and  independent  of  t, 


— -k3i 

1  Xa{l  —  Aa)A6(^  —  A^) 

1 

'  Pii-i) 

This  and  other  existing  fniGA  parameter  selection  techniques  are  discussed  in  Section  2.6.5.  None  reliably 
obtains  the  necessary  balance  between  convergence  and  disruption  in  practical  applications. 

2.6.5  fmGA  Parameter  Selectior^  Techniques.  The  effectiveness  of  the  fast  messy  genetic  algorithm 
for  a  given  application  depends  on  a  number  of  design  parameters.  In  particular,  experience  [28]  shows  that 
the  effectiveness  of  the  algorithm  is  highly  sensitive  to  the  sequences  of  individual  lengths  {A(*>}  and  threshold 
mappings  No  previously  proposed  techniques  for  selection  of  these  parameters  [28,  35,  46,  47.  54] 

leliably  yields  satisfactory  effectiveness  in  practical  applications. 

Each  of  these  techniques  is  based  on  the  premise  that  in  order  to  be  effective,  the  algorithm  must 

produce  a  final  primordial  phase  population  which  contains  “building  blocks"  in  proportions  sufficient  to 

ensure  “good  mixing”  in  the  juxtapositional  phase.  In  this  context  (and  thus  in  the  remainder  of  this 

section),  building  blocks  are  those  order-k  potential  building  blocks  with  juxtapose  to  form  “the”  order- 

k  optimal  individual.  Where  no  unique  order-^-  optimal  individual  exists,  “building  blocks”  are  not  well 
defined. 

The  earliest  techniques  [27,  28.  35,  46.  54]  are  essentially  heuristic  (Section  2.6.5.1).  Kargupta’s  more 
recent  methodology  [47]  is  less  heuristic  and  yields  parameters  resulting  in  improved  effectiveness  in  a  lim¬ 
ited  study.  It  is  based  on  a  more  complete  model  of  tournament  selection  than  the  earlier  techniques 
(Section  2.6.5.2).  None  of  the  techniques  predicts  the  expected  effectiveness  of  the  algorithm,  nor  whether 
improved  effectiveness  may  result  from  “tweaking”  the  parameters. 
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2.6.5. 1  Heuristic  Techniques.  The  technique  proposed  by  Goldberg,  et  al.  in  their  fmGA 
study  [35]  is  based  on  the  heuristic  that  an  adequate  final  primordial  phase  population  results  when  each 
selection  episode  produces  a  sufficient  number  of  copies  of  each  building  block  to  prevent  extinction  by  BBF. 
Based  on  the  probability  of  building  block  survival,  a  building-block  repetition  factor  of 


^  (  A»o  ) 

/AC-n-fcx 
\  Aio-ife  ) 

«  {^y.  for7'’)»fc 


=  P 


-k 


where  pi  —  is  sufficient  for  at  least  one  copy  of  an  order-fc  building  block  to  survive  a  reduction  of 

string  length  from  to  A^'*. 

The  fmGA  study  proposes  “fixing  7  to  a  constant  value  much  less  than  where  G  is  the  number  of 
selection  repetitions  per  length  reduction.  Doing  so  "roughly  implies  a  fixed  length-reduction  ration  p  —  Pi 
for  all  i."  It  is  not  clear  how  ts  should  be  chosen,  nor  is  it  clear  how  to  choose  p  except  that  7  <  2*"  should 
be  satisfied. 

Regarding  the  use  of  the  thresholding  parameters  proposed  earlier  by  Goldberg.  Deb.  and  Korb  [31]. 
the  fmGA  study  reports  that  “this  procedure  has  not  proven  to  be  adequate."  Instead,  a  threshold  of 


is  suggested,  where  Ai  and  A2  are  the  lengths  of  the  competing  individuals.  I  is  the  nominal  string  length, 
and  <7^  is  the  variance  of  the  hypergeometric  distribution  [53]  having  parameters  Ai.  A2.  and  t 

2  _  Ai(f  —  Al )X2[i  —  A2) 
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The  experiments  reported  in  the  fniGA  study  do  not  use  parameters  obtained  using  the  proposed  method¬ 
ology.  According  to  Kargupta  [46]. 

The  [experimental]  results  presented  in  the  ICGA  paper  were  based  on  [an]  empirically  tuned 
schedule,  since  we  did  not  have  the  complete  theoretical  analysis  of  [the  fast  messy  genetic 
algorithm]  at  that  time. 

The  empirical  tuning  involves  measuring  the  fraction  of  the  individuals  in  each  generation  containing  each 
building  block,  and  adjusting  the  parameters  based  on  those  fractions  [46].  Parameters  obtained  via  this 
method  for  a  50-bit  order-5  fully  deceptive  function^®  are  shown  in  Table  2.  The  specific  tuning  strategy  by 
which  the  final  parameters  are  obtained  is  not  known  to  this  author. 


Table  2.  Empirically  tuned  fniGA  thresholding  and  filtering  parameters  for  a  50-bit  order-5  fully  deceptive 
objective  function 


Episode 

Cut  generation 

String  length 

Threshold 

0 

0 

45 

39 

1 

7 

39 

35 

2 

11 

34 

28 

3 

15 

29 

23 

4 

19 

25 

18 

5 

23 

22 

15 

6 

29 

19 

13 

•  7 

35 

16 

10 

8 

41 

14 

9 

9 

47 

12 

7 

10 

53 

10 

6 

11 

59 

8 

5 

12 

65 

7 

4 

13 

71 

6 

3 

14 

77 

5 

4 

This  empirical  tuning  method  requires  a  priori  knowledge  of  which  genes  constitute  building  blocks. 
Such  knowledge  is  not  available  for  practical  applications.  Consequently,  this  parameter  selection  technique 
is  not  generally  applicable.  For  example,  in  an  application  of  the  fast  messy  genetic  algorithm  to  energy 
minimization  [54].  it  is  not  known  whether  or  not  order-lc  building  blocks  exist,  much  less  which  genes 
constitute  those  building  blocks.  Furthermore,  the  execution  time  required  for  this  application  prohibits 

Although  apparently  not  the  same  function  addressed  in  the  fmGA  study. 
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any  substantial  empirical  tuning  of  the  exogenous  parameters.  The  experiments  reported  in  the  energy 
minimization  study  use  a  heuristically  '‘scaled  version  of  the  schedule  in  Table  2,  which  is  shown  in  Table  3. 
In  anothei  application  to  the  same  problem  [28],  the  average  effectiveness  of  the  algorithm  resulting  from 

Table  3.  Scaled  fmGA  thresholding  and  filtering  parameters  for  a  240-bit  objective  function 


Episode 

Generation 

String  length 

Threshold 

0 

0 

216 

194 

1 

7 

185 

143 

2 

11 

157 

107 

3 

15 

135 

84 

4 

19 

115 

64 

5 

23 

98 

47 

6 

29 

84 

38 

7 

35 

72 

31 

8 

41 

61 

25 

9 

47 

53 

21 

10 

53 

45 

17 

11 

59 

39 

15 

12  : 

1 

65 

33 

12 

13 

71 

29 

10 

14 

77 

24 

8 

15 

82 

21 

7 

16 

87 

18 

6 

17 

92 

15 

5 

18 

97 

13 

4 

19 

102 

11 

4 

20 

107 

9 

3 

21 

112 

8 

3 

22 

117 

7 

2 

23 

122 

6 

2 

24 

127 

5 

3 

this  schedule  is  compared  to  three  others: 

1.  “50%  initial  similarity,  linearly  increasing  to  100%  similarity."  i.e.  the  threshold  mapping  in  episode  e 
is  such  that  0^^^  =  ( |  +  I ^)A(e)  where  e/  is  the  final  selection  episode; 

2.  “50%  initial  similarity,  linearly  increasing  to  80%  similarity."  i.e.  0^^)  =  (5  +  and 

3.  "constant  80%  similarity,"’  i.e.  0(g)  =  |A(g). 
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The  results  obtained  for  this  application,  while  far  from  exhaustive,  indicate  that  the  last  schedule  is  the 
most  effective  of  those  compared. 

2. 6. 5. 2  Kargupta's  Technique.  Kargupta’s  more  recent  methodology  [47]  is  less  heuristic  in 
nature  and  yields  parameters  resulting  in  improved  effectiveness  in  a  limited  study  compared  to  the  method 
of  Goldberg,  et  al.  [29].  This  section  formalizes  Kargupta's  description  of  this  technique.  The  stated  design 
objective  of  the  technique  is  to  ensure  that  the  fraction  of  individuals  containing  each  of  the  building  blocks 
grows  nearly  uniformly.  This  growth  is  achieved  and  controlled  through  the  choice  of  three  sets  of  design 
variables: 

1.  the  duration  of  each  selection  episode  e, 

2.  the  threshold  parameter  for  each  selection  episode  e.  and 

3.  the  number  of  genes  A(e)  —  deleted  in  each  filtering  event. 

Formal  statement  of  the  technique  is  facilitated  by  a  brief  review  of  the  underlying  theory. 

The  theoretical  development  includes  a  more  "realistic"  model  of  BTS  with  thresholding  than  that 
used  in  the  research  of  Goldberg,  et  al.  [35],  focusing  on  '‘cross-competition"  between  building  blocks.  The 
model  views  individuals  as  containing  no  more  than  one  of  m  building  blocks.  That  is.  individuals  are  viewed 
as  belonging  to  one  of  m  -|- 1  classes,  where  m  is  the  number  of  building  blocks: 

•  the  classes  “f".  where  i  e  {1, . . . .  m),  consisting  of  the  individuals  containing  building  block  i  (assumed 
to  be  mutually  exclusive),  and 

•  the  class  '■‘■junk''  of  individuals  containing  no  building  blocks. 

The  fraction  of  individuals  in  class  i  in  generation  f  -h  1  is  modeled  by 

—  ?i,t  I  ^  ~  9i.t  ~  ^  "](2  —  t 

V 
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and  the  fraction  of  individuals  in  junk  is 

2 

Qjunk.t^l  =  Qjnnk,t 

{Qjunk,o) 

where  aij  is  “the  expected  number  of  copies  of  building  block  i  resulting  from  competition  with  building 
block  The  matrix  a  having  components  aij  is  called  the  interaction  mutrix. 

Using  this  model.  Kargupta  considers  two  “extreme"  cases.  The  first  is  the  '^unbiased"  case,  in  which 
aij  =  1  for  all  i,  j,  corresponding  to  equally  scaled  building  blocks,  so  that  the  interaction  matrix  is 

1  1  ...  1 
1  1  ...  1 

1  1  ...  1 

The  second  is  the  “strong  bias”  case,  in  which  the  interaction  matrix  is  of  the  form 

(5) 

In  the  strong  bias  case,  every  individual  which  contains  a  particular  building  block  i  is  more  fit  than  every 
individual  which  lacks  building  block  i  (Equation  5  assumes  without  loss  of  generality  that  i  =  l).  Also,  for 
some  building  block  j,  every  individual  which  contains  a  building  block  k  ^  j  is  more  fit  than  every  individual 
which  contains  building  block  j  (Equation  5  assumes,  again  without  loss  of  generality,  that  7  =  2).  Excluding 
building  blocks  i  and  7.  the  strong  bias  case  is  identical  to  the  unbiased  case. 
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The  theoretical  development  motivating  Kargupta's  parameter  selection  technique  also  includes  a  model 
of  BBF.  essentially  identical  to  that  of  Goldberg,  et  al.  [35].  The  fraction  of  individuals  in  class  i  following 
the  BBF  event  following  selection  episode  e  are  modeled  by 


0,6+1  —  i 


and  the  fraction  of  individuals  in  junk  is 


0,6+1  —  Qjunkstl,e  +  (1  ~  ^e+1  ) 


where 


Ve+l 


/  X(e)-k  \ 


Based  on  these  models  of  BTS  and  building  block  filtering,  Kargupta  proposes  a  methodology  by  which 
to  obtain  fast  messy  genetic  algorithm  exogenous  parameters.  The  stated  design  principle  motivating  the 
technique  is  the  control  of  “niche  sizes."  In  brief,  Kargupta  seeks  to  choose  the  thresholds  so  that  filtered 
individuals  are  (9(^+1) -compatible  if  and  only  if  their  parents  are  0(e)-compatible,  i.e. 


Ac(a,b)  >  0(e)  Ac(m(a),»n(b))  ^  ^(6+1) 

is  satisfied.  The  technique  is  apparently  designed  to  satisfy  the  condition  in  expectation  in  some  sense.  It 
may  be  stated  formally  as  shown  in  Figure  16.  Kargupta  views  7.  S.  and  (3  as  design  parameters,  but 
offers  little  guidance  as  to  how  they  should  be  selected  for  a  given  application.  The  experiments  he  reports 
are  for  an  order-5  fully  deceptive  fitness  function,  with  p  =  0.5,  6  =  0.01  and  /3  =  2,  which  implies  that 
7  =  =  32.  Because  7  <  Kargupta’s  experiments  are  apparently  for  tg  >  5. 
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1.  Fix  7  2*',  p  =  7  <?)  €  (0, 1),  and  (3  >  1. 

2.  Take  A(0)  =  ^  -  A;  and  (9(o)  =  [ . 

3.  For  each  episode  e,  with  a,b  ~  !7(J(A(e))): 

(a)  Take  A^g_j_]_)  —  pA^g^. 

(b)  Take  t"  =  max{t  6  Z'^  :  (Vi,j)[|g<,t  -  qj,t\  <  ^]}- 

(c)  If  A(e)  >  I3k.  take  ^(e+i)  =  S  [Ac(m(a).»n(b))  |  Ac(a.b)  <  e^,)] 

(d)  If  A(e)  <  I3k.  take  6»(e+i)  =  S  [Ac(m(a).m(b))]. 

Figure  16.  Kargupta's  fmGA  Parameter  Selection  Technique 

The  choice  of  the  initial  individual  length  A(o)  is  consistent  with  the  recommendations  of  Goldberg,  et 
al.  [35].  The  individual  lengths  A(e)  resulting  from  the  constant  string  reduction  ratio  p  are  also  consistent 
with  those  recommendations.  Similarly,  the  choice  of  the  initial  threshold  0(o)  is  consistent  with  the  original 
messy  genetic  algorithm  thresholding  theory  [16]. 

Each  t*  is  chosen  so  that  as  many  iterations  of  selection  as  possible  are  performed  while  ensuring  "even 
growth^'  of  the  building  blocks  within  episode  e.  Kargupta  does  not  address  the  existence  or  determination 
oi  ci  S  e  (0, 1)  such  that  each  >  1  for  a  given  application.  If  such  a  6  does  not  exist  or  simply  cannot  be 
readily  identified,  the  technique  fails. 

The  last  two  steps  choose  the  threshold  parameters  heuristically  so  as  to  control  the  “niche  sizes,” 
as  discussed  previously.  Strictly  speaking,  Kargupta’s  description  of  the  technique  specifies  a  choice  of  the 
threshold  reduction  A(e)  -  A(e4.i),  rather  than  the  threshold  A(e4.i)  itself.  In  the  early  episodes  (for  which  6  is 
determined  by  the  conditional  expectation),  the  thresholds  are  chosen  to  be  relatively  small,  thus  permitting 
relatively  unrestricted  competition.  In  the  late  episodes  (for  which  0  is  determined  by  the  unconditional 
expectation),  the  thresholds  are  more  conservative,  thus  reducing  cross  competition. 

Kargupta  offers  little  justification  for  the  specific  expectations  recommended,  except  that  they  resulted 
in  improved  effectiveness  over  previous  scheduling  techniques  for  the  experiments  performed.  Nor  does  he 
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address  the  fact  that  the  conditional  expectation  fails  to  exist  when  O^^c)  <  2A(e^  -  i,  i.e.  when  all  individuals 
are  ^(e)-compatible.^® 

2, 7  Summary 

Evolutionary  algorithms  are  a  class  of  stochastic  population-based  algorithms  which  are  commonly 
applied  as  optimum  seeking  techniques.  Included  within  this  broad  class  are  the  loosely  defined  classes  of 
genetic  algorithms,  evolutionary  programming,  and  evolution  strategies.  A  novel  framework  for  evolutionary 
algorithms  is  proposed,  in  which  evolutionary  operators  are  viewed  as  mappings  from  parameter  spaces  to 
random  population  transformations.  Definitions  of  recombination,  mutation,  and  selection  operators  which 
capture  their  distinguishing  characteristics  are  proposed  within  this  framework. 

A  specific  example  of  evolutionary  algorithms  is  the  simple  genetic  algorithm  (sGA).  Another  class 
of  evolutionary  algorithms,  which  historically  arose  from  genetic  algorithms  research  is  the  class  of  linkage- 
friendly  genetic  algorithms  (IfGAs).  The  primary  distinctions  of  IfGAs.  as  defined  in  this  research,  are 
their  use  of  order-invariant  representation  schemes  and  strictly  invariant  selection  operators  (see  Figure  17). 
Specific  examples  of  IfGAs  include  the  messy  genetic  algorithm  (mGA),  the  fast  messy  genetic  algorithm 
(fmGA).  and  the  gene  expression  messy  genetic  algorithm  (gemGA).  The  effectiveness  of  the  fmGA  is 
sensitive  to  the  sequences  of  individual  lengths  and  threshold  mappings.  Existing  fmGA  parameter  selection 
techniques  do  not  reliably  yield  satisfactory  effectiveness  for  practical  applications. 


^®Kargupta^s  actual  recommendation  for  early  episode  thresholds  is  ^(c+i)  =  S  [Ac(Tn(a), m(b))  |  < 

Ac(a,  b)  <  0(g)]  but  this  seems  particularly  arbitrary  and  is  inconsistent  with  other  parts  of  the  discussion.  Furthermore,  this 
fails  to  exist  if  0(e)  >  A(e).  This  condition  occurs,  for  example,  in  the  initial  selection  episode  when  A(o)  >  k  so  that  0(o)  =  A(o). 
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Ill-  GcncTdlizcd  Fast  Messy  Genetic  Algovithins 

This  chapter  proposes  a  novel  linkage-friendly  genetic  algorithm.  The  algorithm  shares  the  high-level 
structure  of  the  fast  messy  genetic  algorithm,  shown  in  Figure  14.  as  well  a.s  the  representation  scheme 
of  the  messy  genetic  algorithm  (inGA)  and  fast  messy  genetic  algorithm  (fniGA),  defined  in  Section  2.6.1. 

Consequently,  it  is  convenient  to  refer  to  the  algorithm  as  the  generalized  fast  messy  genetic  algorithm 
(gfmGA). 

In  the  gfmGA  initialization  phase,  a  competitive  template  is  selected  and  an  initial  population  is 
randomly  generated.  The  gfmGA  primordial  phase  uses  the  probabilistic  building  block  filtering  operator 
(Section  3.1)  and  binary  tournament  selection  with  probabilistic  thresholding  operator  (Section  3.2).  Both  of 
these  operators  are  novel  generalizations  of  the  operators  used  by  the  fmGA.  The  juxtapositioiial  phase  uses 
the  cut-and-splice  operator  (Section  2.6.2.1),  as  well  as  the  binary  tournament  selection  with  probabilistic 
thresholding  operator.  Section  3.3  defines  the  gfmGA  in  the  formal  framework  of  Section  2.3  and  shows  that 
the  fmGA  is  a  special  case  of  the  gfmGA. 

Mathematical  models  of  the  two  gfmGA  primordial  phase  operators  are  developed  in  Chapters  IV 
and  V.  Together,  the  models  permit  the  definition  of  expected  gfmGA  effectiveness  as  a  continuously 
differentiable  function  of  the  gfmGA  parameters  (Chapter  VI).  Optimization  of  the  related  cost  function  J 
by  various  techniques  yields  parameter  selection  methodologies  for  the  fmGA  and  the  gfmGA. 

Because  the  fmGA  is  a  special  case  of  the  gfmGA,  existence  is  guaranteed  of  parameters  for  which  the 
gfmGA  expected  effectiveness  is  no  worse  than  the  best  possible  fmGA  expected  effectiveness. 

Furthermore,  partly  because  the  gfmGA  parameters  are  real-valued,  vector  space  optimization  tech¬ 
niques  may  be  used  to  obtain  formal  necessary  optimality  conditions  (NOCs)  for  the  gfmGA  parameters. 

3.1  Mutation 

The  generalized  fast  messy  genetic  algorithm  uses  the  probabilistic  building  block  filtering  (probabilistic 
■BBF)  operator,  which  this  research  views  as  a  mutation  operator.  The  probabilistic  BBF  operator  is  a  novel 
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generalization  of  the  deterministic  BBF  operator  (Section  2.6. 2.2).  Whereas  the  deterministic  building  block 
filtering  operator  deletes  the  same  fixed  number  A'*'  -  A<*+i>  of  genes  from  each  individual  in  generation 
t,  the  probabilistic  BBF  operator  adds  or  deletes  a  random  number  of  genes,  determined  independently  for 
each  individual. 

If  <  A<*\  the  genes  to  be  deleted  are  drawn  without  replacement  from  a  uniform  distribution 

over  all  of  the  individuaFs  genes.  Equivalently,  the  genes  to  be  retained  are  chosen  uniformly  without 
replacement.  If  A**+^'  >  A*‘'.  the  genes  to  be  added  are  generated  by  drawing  without  replacement  from  the 
set  of  loci  for  which  the  individual  does  not  already  contain  a  gene,  then  drawing  from  the  genic  alphabet 
independently  for  each  new  gene.  Thus,  the  operator  preserves  the  non-overspecified  property  of  primordial 

phase  individuals.  It  is  convenient  to  define  the  probabilistic  BBF  operator  in  terms  of  the  generalized  local 
BBF  operator. 

Definition  3.1.1  (Generalized  local  building  block  filtering  operator):  Let  I  be  an  IfGA  individual 
space  over  genic  alphabet  A  with  nominal  string  length  I, 

=  |(T  €  T  ^{0, .. .  ,f},  y  TTi^  :  d-(?:)  e 

=  A^.u}  =  («5'i.(T2,(ai....,af))  ~  U(Ll).  sort({/3i . =  (/!„,....  ./3„J  such  that  pn,  < 


m[((ai./i). . . . ,  {ax„.lx,)),  (m, . .  .,ae),  (/Ia„+i,  . . ai.aj.  A/] 

f 

((^cri(l)?/a-i(i)) - .  if  Xf  <  Ao 

A  / 

—  <  (  (®cri(l)- ^cjl(l)) - ^i^(Ti{Xf)J(TiiXf)}-, 

(^cr2(l)  +  Ao'/^  (To  (l)+Ao) . ^*^cr2(A/ —  Ao)' ^o'2(  —  Aq)  )  ^  '  f/  ^  Aq 
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and  m'  6  £VOV(IA.^,^)  an  evolutionary  operator.  If  for  every  Xf  €  {0,,..,^}  (the  offspring  individual 
length^,  and  every  a  =  ({aiJi) - A^Xo-ho))  G  L  m'  satisfies 


=  m(a,(Qi . a<).sor<!({l . -  {h . iAo})'<^i('^o),^2(^  -  Ao).A/)  , 


then  m!  is  called  a  generalized  local  building  block  filtering  operator.  □ 

The  number  of  genes  added  or  deleted  from  a  parent  individual  in  generation  i  €  {0, tp  “*  1}  is  determined 
by  the  offspring  individual  length  which  is  a  random  variable  chosen  according  to 


V><*'(A)  =  Pr[A‘'+^)  =A] 


where  each  is  an  exogenous  filtering  parameter.  Because  each  -0^*^  is  a  probability  density  function 

of  the  discrete  type. 


V’<‘>(0)  =  1-X)V’“’(A)  . 

A=1 


and  the  0^^^(A)-s  are  subject  to  the  constraints 


(VA€{1,...,0)[V'“'(A)>0] 


and 


i 


EV’“HA)<1  . 

A=1 


(6) 


Definition  3.1.2  (Probabilistic  building  block  filtering  operator):  Let  I  he  an  IfGA  individual 
space  over  genic  alphabet  A  with  nominal  string  length  t.  pi  =  p!  Q.  (the  population  size/, 


—  ('5)r  X  (Stt  X  .  u3  —  (wi . ^fi')  ~  U{il).  in'  a  generalized  local  BBF  operator,  and  m  :  [0. 1]^  — ► 

T(^l.T(I>^  ,(I(\f)y^  ))  an  evolutionary  operator.  If  for  every  V’  =  (‘>p(l) . V'(0)  satisfying  Equations  6 

(the  filtering  parameters/,  every  P  £  (the  parent  population),  and  every  i  £  {1 _ ,/t'},  m  satisfies 


I-ELiV-IA)  .ifXf=0 
then  m  is  called  a  probabilistic  building  block  filtering  operator.  □ 


3.2  Selection 

The  gfmGA  primordial  phase  uses  the  binary  tournament  selection  (BTS)  with  probabilistic  thresh¬ 
olding  operator.  As  in  the  deterministic  case  (Section  2. 6.2. 3)  competition  is  restricted  to  those  which  are 
determined  to  be  compatible.  In  contrast  to  the  deterministic  case,  individuals  are  considered  compatible 
with  a  probability  which  depends  on  their  similarity.  The  formal  definition  of  probabilistic  compatibility  is 
more  general  in  that  it  does  not  require  the  threshold  mapping  to  depend  on  an  individual  similarity. 

Definition  3.2.1  (Probabilistically  ^-compatible):  Let  I  be  a  non-empty  set  (the  individual  space^ 
and  e  :  P  — ^  [0,1]  (the  probabilistic  threshold  mapping/  Then  ae  I  and  h  e  I  are  called  probabilistically 
S-compatible  with  probability  ^(a.b).  //  X  ~  P([0,  Ij)  is  sampled  and  X  <  ^(a.b)  then  a  and  b  are  found 
to  be  probabilistically  ^-compatible.  □ 

For  the  gfmGA,  individuals  a  G  I(^a)  and  b  G  /(A^)  and  sharing  Ac  =  Ac(a,b)  common  defining  loci 
are  probabilistically  ^-compatible  with  probability  ^^*/Ac;  Aq.  Aj).  where  each  A^.  A&)  is  an  exogenous 

thresholding  parameter. 
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The  thresholding  parameters  are  probabilities  for  each  <6  {0 . </-!}•  and  hence  subject  to  the 

constraints 


VA,Aa,A5  :  e<‘^(A;Aa,Ai,)  >  0  , 

VA.AaiAft  :  Aa.A)  <  1  .  (7) 


Definition  3.2.2  (Binary  toiurnament  selection  with  probabilistic  thresholding  operator  and 
finite  shnffle  size  Let  I  he  a  non-empty  set  (the  individual  space/.  I  G  Z'*'.  p  €  Z+  (the  parent 

population  size/,  p!  €  Z+  (the  offspring  population  size  n^h  €  Z+  (the  shuffle  size^,  =  ({1,. .  x 

w  =  ((a;o(l), . . . ,  Wn.  Jl)). . . . ,  . . . ,  (//),  X)  ~  U(9-)  , 

and  s  €  SVOV(I.p,T(P,Y),l\)  x  T(/,R).f2).  Also,  define  j  :  {1 . p'}  x  T(I\[0,1])  {0 . by 


\  0  .ifm[Xik>e{p^,^i),p^^^i,)] 

\  min{/c  :  Xik  <  ^{Puod)- Puk{i))}  •  otherwise 


If  for  every  e  :  P  — »  [0.1]  (the  threshold  mapping^,  every  fitness  function  $  :  /  — and  every  population 
P  E  s  satisfies 


;  otherwise 


then  s  is  called  a  binary  tournament  selection  with  thresholding  operator.  If  for  every  a  G  I(Xa)  and  every 
b  G  /(Afc),  $  also  satisfies  ^  ^(a, b)  =  where  6  is  the  Kronecker  delta  function,  then  s  is 

called  length  preserving.  q 
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As  in  the  detenuiiiistic  case,  the  BTS  with  probabilistic  thresholding  operator  is  an  order-based  se¬ 
lection  operator.  Thus,  it  is  strictly  invariant  (Theorem  2.5.5).  Consequently,  the  effectiveness  of  any 
evolutionary  algorithm  using  BTS  with  probabilistic  thresholding,  including  the  gfmGA,  is  unaffected  by 
the  choice  of  (order-preserving)  fitness  scaling  function. 

3.3  Algorithmic  Specification 

The  preceding  sections  describe  the  novel  operators  used  by  the  gfmGA.  This  section  specifies  the 
gfmGA  in  the  formal  framework  of  Section  2.3,  and  shows  that  the  fmGA  is  a  special  case  of  the  gfmGA, 

Definition  3.3.1  (Generalized  fast  messy  genetic  algorithm):  Let 

•  I  be  an  IfGA  individual  space  over  the  genic  alphabet  {0,1}  with  nominal  string  length  I  and  overflow 
factor  o, 

•  J(A)  defined  by  Equation  2, 

•  k  ^  {1, - 1)  (the  building  block  size^, 

•  tf  ^  (the  final  generation/, 

•  tp  £  {0, . . .  Af}  (the  final  primordial  phase  generation/ 

•  ^  i  -  k  (the  initial  individual  length/ 

•  a  sequence  C  [0.1]^  satisfying  Equations  6  (the  filtering  parameters/ 

•  a  6  [0. 1]  (the  probability  of  selection  error/, 

•  Za  €  K  such  that  Z  ^  N(0, 1)  ==^  Ft[Z  >  z^]  =  1  —  a, 

•  G  (the  maximum,  inverse  signal-to-noise  ratio  per  subfunction  to  he  detected). 

•  p,  —  p!  =  2z^/3^  ( [ -  1)2^  (the  population  size/ 

•  c  ^  Ip  =  I{1)  (the  competitive  template), 

•  ^  ^  R  is  an  IfGA  fitness  functioUj 
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•  ^  ^  {true .false}  (the  termination  criterion)  such  that 

^({i^(0) . P{i)})  =  true  <=>  caTd({P(0) . ^^(0})  >  ^ 

•  r  a  sequence  of  Goldberg cut- and- splice  operators  — >•  T 

%  in  a  sequence  of  evolutionary  operators, 

•  for  0  <  t  <  tp,  :  [0, 1]^  — ►  T  a  probabilistic  BBF  operator. 

•  for  tp  <  t  <  tf ,  an  identity  evolutionary  operator, 

•  s  a  sequence  of  BTS  with  probabilistic  thresholding  operators 

•  0m  =  for  0  <t  <  tp  (the  filtering  parameters^, 

•  Pc^  =pi'-  =0for0<t<tp, 

•  0r  ^  =  {pc  for  0  <  t  <  tf  (the  cut-and-splice  parameters/  and 

•  da  sequence  of  threshold  mappings  :  P  — ^  [O-l]- 


Then  the  algorithm  shown  in  Figure  18  is  called  a  fast  messy  genetic  algorithm. 


t  -.=  0; 

initialize  P(0)  :=  {ai(0). . . . .  a^,o)(0)}  ~  17(/(A<‘'>)); 
while  (t({P(0), .... P(t)})  ^  true)  do 
recombine:  P'{t)  :=  r^d) (P(t)); 
mutate:  P"(t)  :=  mQ^){P'{t))\ 
select:  P(t+  1)  :=  S(e”$^)(P"(t)); 

1 1=  i  -{-  Ij 
od 


Figure  18.  Outline  of  a  Generalized  Fast  Messy  Genetic  Algorithm 
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Building  on  the  recommendation  of  Goldberg,  et  al.  regarding  the  iiiGA  [36],  the  algorithm  may  be 
applied  iteratively  for  2<k<  The  objective  of  iteration  k  is  to  identify  an  order-Jk  optimal  individual, 

given  an  order-(^-  - 1)  optimal  competitive  template  c.  The  optimality  condition  on  c  for  the  second  iteration 
{k  =  2)  is  order-1,  which  may  be  satisfied  efficiently  by  hill-climbing  in  I{t). 

The  gfmGA  uses  an  order-invariant  representation  and  a  strictly  invariant  selection  operator.  Thus. 
It  IS  a  linkage-friendly  genetic  algorithm.  The  remainder  of  this  section  shows  formally  that  the  fmGA  is  a 
special  case  of  the  gfmGA.  in  the  sense  that  every  instantiation  of  the  fmGA  is  equivalent  to  an  instantiation 

of  the  gfmGA.  but  the  converse  does  not  hold.  The  first  lemma  considers  the  relationship  between  the 
deterministic  and  probabilistic  BBF  operators. 

Lemma  3.3.2  The  building  Mock  filtering  operator  is  a  special  case  of  the  probabilistic  building  block  filtering 
operator. 

Proof:  Let  I  be  an  IfGA  individual  space.  /i  €  Z.  P  6  m  the  BBF  operator,  and  A/  €  {0,...,f}. 
Then  the  probability  that  [mA,(P)]i  is  of  length  A  is  1  if  A  =  A;  and  0  otherwise.  This  is  equivalent  to  the 
probabilistic  BBF  operator  with  filtering  parameters  V>(A)  =  where  6  is  the  Kronecker  delta  function. 
Now  let  V-  be  any  filtering  parameters  which  are  not  of  this  form,  and  m  the  probabilistic  BBF  operator. 
Then  with  nonzero  probability  [m^(P)]i  is  of  length  different  than  A/.  ^ 

The  next  lemma  considers  the  relationship  between  the  deterministic  and  probabilistic  BTS  with  thresholding 
operators. 

Lemma  3.3.3  The  binary  toumam.eni  selection  with  thresholding  operator  is  a  special  case  of  the  binary 
tournament  selection  with  probabilistic  thresholding  operator. 

Proof:  Let  I  be  a  non-empty  set.  /r  £  Z+,  P  e  7^  s  the  BTS  with  thresholding  operator,  d  :  7^  _  N,  and 
Then  the  probability  that  a  and  b  are  0-compatible  is  1  if  d(a,b)  >  0(a,b)  and  0  otherwise. 
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This  is  equivalent  to  the  BTS  with  probabilistic  threshokling  operator  with  threshold  mapping 

{1  ,ifd(a.b)  >0(a.b) 

0  if  d(a,b)  <  0(a.b) 

Now  let  T  be  any  threshold  mapping  which  is  not  of  this  form,  and  s  the  BTS  with  probabilistic  thresholding 
operator.  Then  there  exist  individuals  a  and  b  with  probability  of  T-compatibility  p  0  {0, 1}.  ■ 

The  preceding  lemmas  imply  that  the  fmGA  is  a  special  case  of  the  gfmGA. 

Theorem  3.3.4  The  fmGA  is  a  special  case  of  the  set  of  the  length-preserving  gfmGA. 

Proof:  The  result  follows  immediately  from  Lemmas  3.3.2  and  3.3.3.  the  definitions  of  the  fmGA  and  the 
gfmGA,  and  the  observation  that  all  individuals  in  an  fmGA  primordial  phase  population  are  of  the  same 
length.  g 

This  result  implies  the  existence  of  probabilistic  filtering  and  thresholding  parameters  for  which  the  gfmGA 
expected  effectiveness  is  no  worse  than  the  best  possible  fmGA  expected  effectiveness. 

3-4  Summary 

The  generalized  fast  messy  genetic  algorithm  (gfmGA)  is  a  novel  linkage-friendly  genetic  algorithm 
(IfGA).  It  shares  the  representation  scheme  of  the  messy  genetic  algorithm  (mGA)  and  the  fast  messy 
genetic  algorithm  (fmGA),  both  of  which  are  also  IfGAs.  The  gfmGA  also  shares  the  high-level  structure 
of  the  fmGA.  The  geniGA  is  another  IfGA  mentioned  in  Chapter  II  (see  Figure  19).  The  gfmGA  differs 
from  the  fmGA  in  its  use  of  novel  probabilistic  generalizations  of  the  building  block  filtering  and  binary 
tournament  selection  with  thresholding  operators.  Because  the  fmGA  is  a  special  case  of  the  gfmGA, 
existence  is  guaranteed  of  parameters  for  which  the  gfmGA  expected  effectiveness  is  no  worse  than  the  best 
possible  fmGA  expected  effectiveness. 
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IV.  Probabilistic  Building  Block  Filtering 

The  deterministic  building  block  filtering  (BBF)  operators  used  by  fast  messy  genetic  algorithms  delete 
the  same  fixed  number  of  genes  from  each  individual  in  a  particular  generation  (Section  2.6.4).  In  contrast, 
the  probabilistic  BBF  operators  used  by  generalized  fast  messy  genetic  algorithms  delete  (or  add)  a  random 
number  of  genes,  the  number  being  determined  independently  for  each  individual  (Section  3.1).  This  chapter 
develops  a  dynamical  systems  model  of  probabilistic  BBF  which  treats  individuals  as  belonging  to  one  of 
2{t  +  1)  classes,  where  I  is  the  nominal  string  length.  It  is  applicable  in  general  to  evolutionary  algorithms 
with  linkage-friendly  genetic  algorithm  (IfGA)  individual  spaces  over  finite  genic  alphabets  and  to  generalized 
fast  messy  genetic  algorithms  (Chapter  III)  in  particular. 

Deterministic  BBF  operators  are  modeled  by  Goldberg,  et  al.  [35].  The  model  developed  therein  views 
individuals  as  belonging  to  one  of  two  classes:  those  containing  a  particular  building  block,  and  those  lacking 
it.  The  analysis  is  restricted  to  filtering  operators  which  are  purely  destructive,  as  well  as  non-probabilistic  in 
the  sense  that  all  individuals  in  a  particular  generation  are  of  the  same  length.  Kargupta  extends  the  analysis 
of  destructive  non-probabilistic  filtering  operators  to  simultaneously  consider  multiple  building  blocks  [47]. 
His  analysis  assumes  that  no  individual  contains  more  than  one  building  block. 

Together  with  the  binary  tournament  selection  model  developed  in  Chapter  V,  the  probabilistic  BBF 
model  permits  the  prediction  of  expected  effectiveness.  The  prediction  of  expected  effectiveness  serves  as 
the  foundation  for  the  exogenous  parameter  selection  techniques  proposed  in  Chapter  VI. 

After  introducing  the  overall  form  of  the  mathematical  model  and  certain  notation  (Section  4.1). 
the  probabilities  of  building  block  survival  (Section  4.2)  and  building  block  construction  (Section  4.3)  are 
developed.  The  total  probability  of  building  block  presence  after  filtering  is  developed  in  Section  4.4. 
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^ .  1  Preliminaries 


4. LI  Notation.  It  is  convenient  to  present  certain  definitions.  Separable  fitness  functions  are 
formally  defined  in  terms  of  projection  mappings.^ 

Definition  4.1*1  (Projection  mapping):  Let  I  be  an  IfGA  individual  space  with  genic  alphabet  A  and 
nominal  string  length  I,  and  £  =  {1,...,^}.  If  V{Li....,Lk}  *  ^  such  that 

=  (ttLi . ai,) 

then  . is  a  projection  m.apping.  q 

When  the  set  of  loci  £  possesses  a  partition  {£1, . . . ,  £m}‘  the  projection  mappings  may  be  thought 
of  as  ^‘separating’'  the  allele  vector  space  into  independent  smaller  dimensional  spaces  When 

a  fitness  function  $  can  be  written  as  the  sum  of  subfunctions  operating  on  these  independent  spaces.  $  is 
separable. 

Definition  4.1.2  (Order-fc  separable  IfGA  fitness  function):  Let  I  be  an  IfGA  individual  space  over 
genic  alphabet  A  with  nominal  string  length  and  $c(-)  =  To  f  oDo  T{-,c)  :  I  — *  R  an  IfGA  fitness 
function.  Suppose  that  for  some  fixed  k  <  I  there  exist 

•  a  partition  {A . £^}  ofjC=  {1,...  with  each  h  =  ccrrf(A)  <  k:  and 

•  functions  Di  :  A'‘‘  — >  E”',  fi  :  — >  R,  andTi'.R  — >  R 

such  that 

m 

^e(-)  =  Y>^iifi(Di{VcAT{:C)))))  , 

i=l 

^The  mappings  defined  are  not  -‘projection”  operators  in  the  sense  of  linear  operator  theory  [61].  In  particular,  their  domains 
and  codomams  are  not  in  genersJ  the  same  space. 
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where  the  Vci 's  are  projection  mappings.  Then  called  an  order-fc  separable  IfGA  fitness  function.  Each 
(j>i  =TiO  fiO  DioVciOT  is  called  a  subfunction.  □ 

The  notations  7(A)  and  Ip  for  important  subsets  of  the  individual  space  I  are  introduced  in  Sec¬ 
tion  2.6.1.  It  is  convenient  to  introduce  special  notation  for  other  frequently  mentioned  subsets  of  7,  which 
simplifies  the  analysis  presented  in  the  sequel. 

•  For  each  A  6  {0, . . . ,  and  each  j3  G  {1, . . .  ,m.}.  assume  there  exists  a  unique  order- A:  optimal  building 

block  {(ai  Ji), . . , ,  where  Cp  =  {/i, - If,^}  is  the  set  of  defining  loci  for  subfunction 

and  define 

l0(X)  t  {(a.l)  €  /(A)  :  (Vi  €  C0)(3j  €  C)[(ajJj)  =  (al,L)]}  . 

Then  7^ (A)  is  the  set  of  length  A  individuals  which  contain  building  block  j3. 

•  For  each  A  G  {0, - 1]  and  each  l5  G  {1 . m},  define 

/-,^(A)^  J(A)-J^(A)  . 

Then  7.^^ (A)  is  the  set  of  length  A  individuals  which  lack  building  block  (5. 

Also,  the  set  of  primordial  phase  individuals  containing  building  block  /?  as 

h  =  U^/5(A)  . 

A=0 

Finally,  the  presentation  is  notationally  simplified  through  the  use  of  the  hypergeometric  probability  density 
function.  If  X  is  a  random  variable  with  a  hypergeometric  distribution  [53],  then 

/M\  rN-M\ 

Pr[X  =  x]  =  h{x:n.  M,N)  =  ’  .  (8) 

\  n  / 
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4.1.2  Dynamical  System.s  Model.  Liepins  and  Vose  [49],  propose  a  dynamical  systems  view  of  a 
population  vector  pass,  probability  density  function  over  the  individual  space,  with  the  ith  component  being 
the  probability  with  which  the  *th  individual  is  sampled.  This  research  views  the  population  vector  more 
generally  as  a  density  function  over  a  set  of  equivalence  classes  which  form  a  partition  of  the  individual 
space.  Specifically,  the  population  vector  is  of  the  form 

P  =  (Po,pi,...  I  go.9i,-..  . 

Each  component  pi  is  the  probability  that  an  individual  sampled  from  the  population  belongs  to  the  equiva¬ 
lence  class  I^(i)  consisting  of  those  individuals  which  contain  building  block  P  and  have  length  Similarly, 
each  component  qi  is  the  probability  that  an  individual  sampled  from  the  population  belongs  to  the  equiva¬ 
lence  class  I^p{i)  consisting  of  those  individuals  which  lack  building  block  P  and  have  length  i. 

Probabilistic  BBF  is  modeled  as  a  deterministic  transition  function  Tm  mapping  the  current  population 
vector  p  to  the  expected  next  population  vector  T,„(p).  Because  the  next  population  vector  in  an  infinite 
population  algorithm  exactly  matches  the  expected  population  vector,  the  model  developed  here  is  exact  for 
such  algorithms.  Furthermore,  the  transition  function  Tm  is  independent  of  population  size  (see  Vose  and 
Wiight  [74]).  Hence,  the  model  is  also  an  exact  expected  value  model  for  finite  population  size  algorithms. 


^.2  Building  Block  Survival 

This  section  develops  a  mathematical  model  of  building  block  survival  under  the  probabilistic  BBF 
operator.  The  probability  of  building  block  survival  is  the  conditional  probability  that  an  individual  contains 
the  building  block  after  filtering  given  that  it  does  before  filtering.  This  probability  depends  on  the  length 
of  the  individual  before  and  after  filtering,  as  stated  in  the  following  theorem; 

■  speaking,  building  blocks  are  defined  only  in  the  case  of  separable  fitness  functions.  For  this  reason,  the  theorems 

in  this  chapter  are  proved  m  the  context  of  such  fitness  functions.  The  results  apply  also  to  fitness  functions  which  are  not 
separable,  with  the  understanding  that  the  ‘'building  block’’  is  simply  the  globally  optimal  individual. 
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Theorem  4.2.1  (Building  block  survival)  Let  I  be  an  IfGA  individual  space  with  nominal  string  length 
^  =  T,i<t>i  ■  I  If  — ►  K  a  separable  fitness  function.  Cp  the  set  of  defining  loci  of  <f>p,  k  =  card{Cp), 
and  m'  a  local  probabilistic  BBF  operator.  If  X  €  /(A)  and  m'(X)  G  /(A)  then  the  probability  of  building 
block  survival  is 

^  I  ,  z/0<A<A 

A,  ^ 

[  1  .  i/A<A<^ 

where  h  is  defined  by  Equation  8. 

Proof*  Consider  first  the  case  \  K  X  K  i.  No  genes  are  deleted  by  so  the  building  block  survives  with 
probability  L  Now  consider  the  case  0  <  A  <  A:.  Then  7n'(X)  does  not  contain  enough  genes  to  contain 
building  block  /?,  so  the  building  block  survives  with  probability  0  =  h(k:X,k,X),  Finally,  consider  the  case 
^•  <  A  <  A.  Then  there  are  (^)  ways  to  choose  the  A  genes  to  keep  from  the  original  A.  Also,  there  are 
(fc)  (a-J;)  choose  all  k  genes  of  the  building  block  and  X  —  k  more  genes  from  the  remaining  X  —  k 

so  that  the  building  block  survives.  Thus,  for  the  case  h  <  A  <  A,  the  probability  that  filtering  does  not 
disrupt  an  existing  building  block  is 


Pr[7n'(X)  €  Ip 


X  e  Ip{X)  Am'{X)  €  I{X)]  = 


hik:X,k,X)  , 


which  completes  the  proof.  g 

Theorem  4.2.1  is  essentially  a  re-statement  of  the  probability  of  survival  claimed  by  Goldberg  et  al.  [35]  and 
later  by  Kargupta  (see  Section  2.6.5.2)  for  a  deterministic  BBF  operator,  generalized  to  apply  to  probabilistic 
BBF  operators.  The  following  theorem  gives  the  total  probability  that  an  individual  contains  a  particular 
building  block  both  before  and  after  filtering,  and  that  it  is  of  particular  lengths  before  and  after  filtering: 


Theorem  4.2.2  (Building  block  presence  before  and  after  filtering) 

Let  I  be  an  IfGA  individual  space  with  nominal  string  length  t.  ^  =  I  X  Ip  — ►  M  a  separable  fitness 

function.  Cp  the  set  of  defining  loci  of<f>p,  k  =  card{Cp).  and  m'  a  local  probabilistic  BBF  operator.  IfX  is 
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drawn  ramdomly  from  population  P{t),  then 


PrK(X)  e7^(A)  AXe//3(A)]  =  Pa„r„(A;, A,A,^)  •  ^(A,i)  ■  Pr[X  € /^(A)]  ,  (10) 

where  tp  is  the  filtering  schedule,  and  the  Ps-nrv(h.  A,  X,l)  s  are  defined  by  Equation  9. 

Proof:  The  event  m'(X)  €  /^(  A)  is  equivalent  to  the  event  m'(X)  e  Ig  A  m'(X)  €  J(A),  so  the  probability 
on  the  left  hand  side  of  Equation  10  may  be  written  as 

PrK(X)  e  Ia(A)  AX  e /^(A)]  =  Pr[7n'(X)  € A  ?n'{X)  G  J(A)  A  X  €  J^(A)]  . 

By  the  definition  of  probabilistic  BBF  operator,  the  event  m'(X)  G  J(A)  is  independent  of  the  event  X  G 
J;3(A).  i.e. 

Pr[7n'(X)  G /(A)AX  G/^(A)]  =  Pr[77i'(X)  G  J{A)]  •  Pr[X  G /^(A)] 

=  •(i-(A,t)-Pr[XG/^(A)]  . 

Suppose  Pr[7n'(X)  G  /(A)  A  X  G  /^(A)]  =  0.  Then 

Pr[77i'(X)  G  Ipi'X)  A  X  G  Ip{\)\  =  0 

”  Psurv  ( A.  A.  f )  ‘0 

=  A,  A, f )  •  ■^(A,  t)  ■  Pr[X  G //3(A)]  . 
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On  the  other  hand,  if  Pr[m'(X)  e  /(A)  A  X  6  IpiX)]  ^  0  then 
Pr[7n'(X)  €  A  X  e  Ip{X)] 

=  Pr[TO'(X)  e  Ip  1  m'(X)  €  /(A)  A  X  £  7^(A)]  •  PrK(X)  £  /(A)  A  X  £  J^(A)] 

=  PrK(X)£7^  |»7x'(X)£/(A)AX£7^(A)]-V>(A.t)-Pr[X€/,j(A)]  . 

Because  Pr[Tn  (X)  £  7^3  |  ?n  (X)  £  7(A)  AX  £  7^(A)]  is  the  probability  of  building  block  survival,  the  result 
follows  immediately  from  Theorem  4.2.1.  g 

4-3  Building  Block  Construction 

The  deterministic  BBF  operators  of  fast  messy  genetic  algorithms  are  such  that  all  individuals  in 
generation  t  +  1  have  lengths  no  greater  than  those  in  generation  t.  Probabilistic  BBF  operators  do  not 
necessarily  exhibit  this  property.  Thus,  it  is  possible  for  a  probabilistic  BBF  operator  to  construct  building 
blocks  as  well  as  disrupt  them.  This  section  develops  a  mathematical  model  of  building  block  construction 
in  generalized  fast  messy  genetic  algorithms. 

The  probability  of  building  block  construction  is  the  conditional  probability  that  an  individual  contains 
a  particular  building  block  p  after  filtering  given  that  it  lacks  the  building  block  before  filtering.  This 
probability  is  nonzero  only  when  the  unfiltered  individual  contains  no  incorrect  genes.  When  this  condition 
holds,  the  probability  depends  on  the  length  of  the  individual  both  before  and  after  filtering,  as  well  as 
the  number  of  missing  genes.  The  number  of  missing  genes  in  an  individual  X.  i.e.  the  number  of  loci  of 
subfunction  p  with  respect  to  which  X  is  unspecified,  is  denoted  .^(X).  That  is,  .^(X)  =  ^  if  and  only 
if  X  is  specified  (correctly  or  otherwise)  with  respect  to  exactly  k  -  k  oi  the  loci  of  subfunction  p.  where 
k  =  card  (£^3)  and  Cp  is  the  set  of  defining  loci  for  subfunction  p.  The  relationship  between  the  probability 
of  building  block  construction,  the  length  of  the  individual  before  and  after  filtering,  and  the  number  of 
missing  genes  is  given  by  the  following  theorem: 
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Theorem  4.3.1  (Building  block  construction)  Let  I  be  an  IfGA  individual  space  over  finite  genic  al¬ 
phabet  A  with  nominal  string  length  t,  ^  :  I  x  Ip  — ►  1  a  separable  fitness  function,  Cp  the  set  of 

defining  loci  of  (ff).  k  =  ca.rd(Cp).  and  m'  a  local  probabilistic  BBF  operator.  If  the  alleles  of  pre-existing 
genes  are  correct  with  probability  \card{A)\~^ ,  then  the  conditional  probability  of  building  block  conMruction 
given  that  k  genes  are  missing  is 


Pr[»n'(X)  €  Ip  1  m'{X)  €  7(A)  A  X  €  I-,p{X)  A  .^(X)  =  k] 

{0  .i/0<A<A  +  ik 

[card  X  -  X,k,t  -  X)  .if  X-\-k<X<i 


Proof:  Suppose  first  0  <  A  <  A  Then  the  filtering  operator  does  not  add  enough  genes  to  complete 
building  block  /?,  so  the  building  block  is  constructed  with  probability  0.  Now  suppose  that  X  +  k  <  \  <  i. 
Then  there  are  ways  to  choose  A  —  A  loci  to  specify  from  the  I-  X  available.  Also,  there  are  (|) 

ways  to  choose  the  k  missing  loci  of  subfunction /3.  as  well  as  (X-X)-k  more  loci  from  the  remaining  {l-X)-k, 
The  alleles  of  the  k  new  genes  are  correct  with  (independent)  probability  [card(^)]”^  By  hypothesis,  the 
alleles  of  the  k-k  pre-existing  genes  are  also  correct  with  (independent)  probability  [card  (^)]“^  Thus,  the 
probability  of  building  block  construction  is 


Pr[m'(X)  €  1/3  I  X  e  I^p(X)  A  7n'(X)  G  I(X)  A  ^(X)  =  fc]  = 


[card(^)] 


\k)  \\-x^y 

iti) 


[card  (4)]  ^h{k:X-X.k,l-X)  . 


which  completes  the  proof. 


Theorem  4.3.1  is  the  building  block  construction  analogue  of  Theorem  4.2.1.  The  following  lemma  is  useful 
ill  the  proof  of  the  analogue  of  Theorem  4.2.2. 
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Lemma  4.3.2  Let  I  be  an  IfGA  individual  space  over  genic  alphabet  A  with  nominal  string  length  t  such 
that  1  <  card  (.4)  <  oc,  #  =  ^ :  /  x  — >  R  a  separable  fitness  function,  Cp  the  set  of  defining  loci  of 
<j)p.  and  k  =  card(Cp).  IfX  is  randomly  drawn  from  population  P{t)  such  that 


X  e  /(A)  Pr[X  =  x]  =  VHA;  t-1)-  [N{\)]-^ 


(11) 


where  i)  is  the  filtering  schedule  and  N{X)  is  defined  in  Section  5.3.1,  then  for  each  k£ 


Fv[X  £  I^piX)  A  K(X)  =  ic]  = 

[1  -  [card  (A)]-^h{k:  X,  A:  J)]"!  •  |^(A;  t  -  1)  -  Pr[X  6  /^(A)]|  -hik-hX,  k,  1)  . 


Proof:  Because  Ip{X)  and  /_,^(A)  form  a  partition  of  /(A), 


Pr[X  € /..;3(A)]  =  Pr[X  € /(A)]  -  Pr[X  e  J;3(A)] 
=  V'(A;t-l)-Pr[Xe  J^(A)]  . 


Suppose  that  Pr[X  6  -f-i/slA)]  =  0.  Then 


Pr[Xe/^^(A)A.ff(X)  =  fc] 

=  0 

=  [1  -  lca.Td(A)]~'^h{k;X.k.e)]~^  ■  0  ■  h(k  -  k:X,k,l} 

=  [1  -  [card  (A)]-'^h{k;  X.  k.  f  )]-i  •  |v^(A:  t-1)  -  Pr[X  e  Jyjf  A)]|  -hik-kx.  k,  1)  . 
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On  the  other  hand,  if  Pr[X  e  ^  0  then 


Pr[X6/^^(A)A^(X)  =  ^]  =  Pr[X  6 /-,;3{A)]  •  Pr[X(X)  =  it  I  X  € /^^(A)] 

=  {•0(A:  f  -  1)  -  Pr[X  €  /^(A)]}  ■  Pr[Ji'(X)  =  fc  |  X  e  J-,^(A)]  . 


By  the  definition  of  a  conditional  probability. 


PT[kix)  =  k\  xei^^(X)]  = 


Pv[k(X)  =  k ^x  e  i^p(X)] 
Pr[X  €  /-^(A)] 


For  X  sampled  according  to  Equation  11  and  k  >  0. 


Pr[,ff(X)  =  fc  I  X  €  7.^(A)]  = 


card  ({X  €  /-,^(A) :  X(X)  =  it}) 

A^^(A) 

[card  (4)]'^  {{)  -  [card  (,4)]^-*^  {{zD 

(a)  -[cardM)]-*^(^:^)' 


=  {[h{k-kX,kJ)]-^ 


1  -  [card  (^)] 


-A; 


(1) 


=  /i(A:  -  fc:A,fc,^)  [1  -  [card(>l)]-*-’/i(jt:A,it,^)]  ^  , 


which  completes  the  proof.  H 

The  final  theorem  of  this  section  gives  the  total  probability  that  an  individual  lacks  a  particular  building 
block  before  filtering,  contains  the  building  block  after  filtering,  and  is  of  particular  lengths  before  and  after 
filtering. 


Theorem  4.3.3  (Building  block  presence  only  after  filtering)  Let  I  be  an  IfGA  individual  space  over 

genic  alphabet  A  with  nominal  string  length  t  such  that  1  <  card  {A)  =  _ »Ea 

separable  fitness  function.  Cp  the  set  of  defining  loci  of  <f>p,  and  k  =  card(C0).  IfX  is  randomly  drawn  from 
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population  P{t)  according  to  Equation  11  then 


Pr[m'(X)e/;9(A)AX€/./3(A)] 


V-(A;  t)  ■  I  i,{X;  t-D-  Pr[X  €  /^(A)  [  •  A,  A.  1), 


where 


Pconsihf  X*  t)  — 


0  ,2/0<A<A  +  ^ 

[[card  -  h{k:  A,  k.  0]"^  E|=i  -  k  A.  k.  1)  ■  h{k.  \-X,k.t-X)  ,  if  X  +  k<X<  I 


Proof:  By  the  definition  of  a  local  BBF  operator,  the  event  7n'(X)  €  7(A)  is  independent  of  the  event 

X  e  I-^piX)  A  KiX)  =  L  he. 

Pr[77i'(X)  6  J(A)  A  X  €  7^^(A)  A  .^(X)  =  k] 

=  Pr[77z'(X)  e  7(A)]  -  PrlX  €  7^^(A)  A  X(X)  =  k] 

=  [1  -  [card(^)]-‘^/i(fc;  A.fc.f)]-V(A,t)  •  |v'(A;t  -  1)  -  Pr[X  €  7;j(A)]|  •  h(k  -  kX.k.t)  , 

where  we  have  used  Lemma  4.3.2.  By  Theorem  4.3.1. 

PrK(X)  €  7^(A)  A  X  €  7-,^(A)  A  .^(X)  =  k] 

=  Pr[77i'(X)  6  7^  A  7n'(X)  €  7(A)  A  X  €  I-,0{X)  A  .^(X)  =  k] 

=  Pr[77i'(X)  €  Id  I  m'(X)  €  7(A)  A  X  €  I-^dW  A  A'(X)  =  fc] 

•  Pr[7n'(X)  €  7(A)  A  X  €  7^^(A)  A  X(X)  =  k] 
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if  0  <  A  <  A  +  fc 

=  •< 

[card  A  -  A,  fe.  f  -  A)  •  [1  -  [card  {A)]~'‘h(k;  A,  Jt, ^)]-i 

^  •V’(A,«)'|v'(A;<-l)-Pr[X€/^(A)]|-/i(fc-fc;A,A:,f).  if  A  +  fc  <  A  <  ^ 

O'  ifO<A<A  +  A: 

=  < 

[[card  (^)]-‘=  -  hik  A.  k,t)]-'^  ■  i,(X.  t)  ■  |v-(A:  t  -  1)  -  Pr[X  €  /^(A)]| 

^  ’h{k,  X  —  X.  k.  I  —  X)  '  h{k  —  k‘,  X,  k,  1).  ifA-HA:<A<^ 

Finally,  by  the  Law  of  Total  Probability  and  the  observation  that 


PrK(X)G//^{A)AX€/./3(A)A^(X)  =  0]  =  0  , 


it  follows  that 


k 

Pr[m'(X)€/^(A)AX€/.^(A)]  =  XI  € /^A)  A  X  e /-.^A)  A  ^(X)  =  fe]  , 

k=l 


which  completes  the  proof. 


4^4  Total  Probability  of  Building  Block  Presence 

For  a  probabilistic  BBF  operator,  individuals  which  are  not  of  the  same  length  before  filtering  may  be 
after  filtering.  The  following  theorem  gives  the  total  probability  that  after  filtering  an  individual  is  of  length 
A  and  contains  building  block  /?. 

Theorem  4.4.1  (Total  probability  of  building  block  presence) 

Let  I  be  an  IfGA  individual  space  over  genic  alphabet  A  with  nominal  string  length  I  such  that  1  <  card  (.4)  < 
oo,  ^  I  xIf  — ►  R  a  separable  fitness  function,  Cg  the  set  of  defining  loci  of<t>ff,  and  k  =  card{Cp). 
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IfX  M  randomly  drawn  from  population  P{t)  according  to  Equation  11  then 


Pr[m(X)  €  7^(A)] 


=  ^  ( Pr[X  e  7^(A)]  •  A.  A)  +  |^(A;  <  -  1)  -  Pr[X  e  7^(  A) |  •  peo„,(A:,  A,  A)^  . 

Proof:  The  7(A)’s  form  a  partition  of  7.  Furthermore,  7p(A)  and  I-.p(X)  form  a  partition  of  7(A).  Thus, 
by  the  Law  of  Total  Probability,  the  probability  that  an  individual  contains  building  block  /3  after  filtering 


IS 


Pr[m(X)  €  7^ (A)] 

=  Pr[m(X)  €7^(A)AX  e  7] 

£ 

=  ^  {  Pr[m(X)  e  7p(A)  A  X  €  7;3(A)]  +  Pr[m(X)  €  Ip(X)  A  X  €  7-.;3(A)]| 

i  . 

=  Z)  ^  •  Psnrv(k,  A,  A)  +  i,(X-.  t)  ■  |v-(A:  t  -  1)  -  Pr[X  6  7;3(A)|  ■  A.  A) 


where  we  have  used  Theorems  4.2.2  and  4.3.3. 


4-5  Summary 

The  building  block  filtering  (BBF)  operators  used  in  fast  messy  genetic  algorithms  delete  the  same 
fixed  number  of  genes  from  each  individual  in  a  particular  generation.  The  probabilistic  BBF  operators  used 
in  generalized  messy  genetic  algorithms  either  add  or  delete  a  random  number  of  genes,  determined  indepen¬ 
dently  for  each  individual.  Previous  models  are  limited  to  deterministic  BBF  operators,  and  consequently 
consider  only  the  probability  of  building  block  survival.  The  mathematical  model  developed  here  extends 
existing  analysis  of  building  block  survival  to  the  probabilistic  case,  and  incorporates  analysis  of  building 
block  construction  to  arrive  at  the  total  probability  of  building  block  presence  following  filtering.  Together 
with  the  binary  tournament  selection  model  developed  in  Chapter  V,  this  model  permits  the  prediction  of 


79 


the  expected  effectiveness  given  a  particular  choice  of  exogenous  parameters.  The  prediction  of  expected 
effectiveness  forms  the  basis  for  the  parameter  selection  techniques  proposed  in  Chapter  VI. 
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V.  Binary  Tournament  Selection  with  Probabilistic  Thresholding 

Previously  proposed  models  of  tournament  selection  (see  Section  2.6. 2. 3)  focus  on  either  takeover  time 
(e.g.  Goldberg  and  Deb  [33])  or  on  selection  intensity  (e.g.  Back  [5]  and  Blickle  and  Thiele  [11]).  Those 
models  in  the  latter  category  treat  all  individuals  as  belonging  to  a  single  class,  so  that  their  fitnesses  are 
independent  and  identically  distributed.  Neither  class  of  models  provides  information  regarding  the  relative 
growth  of  one  (multiple  individual)  class  of  individuals  with  respect  to  another.  Furthermore,  no  previous 
model  of  tournament  selection  considers  thresholding. 

This  chapter  develops  a  model  of  binary  tournament  selection  (BTS)  with  probabilistic  thresholding 
which  treats  individuals  as  belonging  to  one  of  two  classes  with  possibly  differing  fitness  distributions.  The 
model  is  applicable  to  evolutionary  algorithms  using  binary  tournament  selection  with  thresholding  where 
the  more  fit  individual  is  selected  with  probability  1.  The  models  of  binary  tournament  selection  and 
probabilistic  building  block  filtering  (developed  in  Chapter  IV)  allow  the  prediction  of  expected  effectiveness 
resulting  from  a  choice  of  filtering  and  thresholding  parameters.  The  prediction  of  expected  effectiveness 
serves  as  the  foundation  for  the  exogenous  parameter  selection  techniques  proposed  in  Chapter  VI. 

One  key  component  of  the  proposed  tournament  selection  model  is  the  probability  of  “correct  decision 
making.”  defined  and  analyzed  in  Section  5.1.  A  dynamical  systems  model  of  BTS  with  probabilistic  thresh¬ 
olding  is  developed  in  Section  5.2  using  Markov  chain  analysis.  The  distribution  after  selection  depends  on 
the  initial  distribution,  which  is  examined  in  Section  5.3  for  the  case  of  a  uniformly  distributed  population 
in  a  linkage-friendly  genetic  algorithm.  Finally,  Section  5.4  discusses  the  application  of  the  model  to  the 
prediction  of  building  block  processing  in  a  fast  messy  genetic  algorithm. 

5.1  Decision  Making 

Each  (nontrivial)  tournament  performed  in  the  application  of  a  binary  tournament  selection  operator 
may  be  viewed  as  a  decision  between  two  classes  of  individuals  A.BcI  where  one  competing  individual 
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belongs  to  A  and  the  other  to  B}  This  section  develops  a  probabilistic  model  of  ''correct**  decision  making 
as  a  function  of 

•  the  fitness  distributions  F  and  G  of  those  initial  population  individuals  belonging  to  A  and  respec¬ 
tively; 

•  the  number  of  ancestors  which  belong  to  A  and  B  for  each  of  the  competing  individuals; 

•  the  individual  similarity  D  =  d(a,  b)  between  the  competing  individuals  a.  ^  A  and  h  £  B  (typically 
the  number  of  common  defining  loci);  and 

•  the  conditional  fitness  distributions  Fq  and  Gq  of  those  individuals  belonging  to  A  and  B,  respectively, 
given  that  the  pair  belongs  to  the  set  Q.(D)  of  such  individual  pairs  also  having  individual  similarity 
D. 

Section  5.1.1  defines  order  statistics,  upon  which  the  model  developed  in  this  section  is  based.  The  section 
also  extends  the  standard  theory  to  obtain  the  distribution  of  the  maximal  order  statistic  of  a  set  of  random 
variables  only  some  of  which  are  identically  distributed.  This  result  is  applied  to  derive  the  probabilities  of 
correct  decision  making  in  the  absence  (Section  5,1.2),  and  in  the  presence  (Section  5.1.3)  of  thresholding. 

5.1,1  Order  Statistics.  Arnold,  et  al,  define  order  statistics  as  follows: 

Definition  5.1.1  (Order  statistic  (Arnold,  et  al.  [4])):  Suppose  that  (Xi . Xn)  are  n  jointly 

distributed  random,  variables.  The  corresponding  order  statistics  are  the  Xi ‘s  arranged  in  nondecreasing 

order.  The  smallest  of  the  Xi 's  is  denoted  by  X^n*  the  second  smallest  is  denoted  by  A2:n-  •  •  •?  and,  finally, 
the  largest  is  denoted  by  Xn:n-  Thus  Xi^n  <  X2:n  <  *  **  <  ^n:n-  □ 

^It  is  convenient  to  think  of  A  and  B  as  forming  a  partition  of  the  individual  space  I  so  that  >1  fl  B  =  {}  and  AU  B  =  I, 

but  only  the  disjointness  condition  is  necessary  to  the  decision  making  model  developed  here. 
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For  the  case  that  the  Xi's  are  independent  and  identically  distributed,  it  is  well  known  that  the 
distribution  function  of  Xi:n  is 


fi:n(a;)  =  SM[F(x)ni-f(x)r 

r=i  '' 


and  that  the  density  function  is 


where  F  is  the  distribution  function  of  the  XiS  and  /  is  the  density  function.  For  many  of  the  order  statistics 
which  appear  in  the  subsequent  analysis,  the  underlying  random  variables  are  not  identically  distributed, 
but  are  partitioned  into  two  sets  within  each  of  which  the  random  variables  are  identically  distributed.  Of 
particular  interest  is  the  conditional  distribution  of  the  maximal  order  statistic,  given  that  its  underlying 
random  variable  possesses  a  particular  distribution.  This  statistic  is  shown  in  Sections  5.1.2  and  5,1.3  to  be 
related  to  the  fitness  distributions  of  individuals  surviving  tournament  selection. 

Theorem  5.1.2  Let  {Xi _ ,-^nv)  >  0  identically  distributed  random  variables  with  density  function 

f  and  distribution  function  F.  Also,  let  (Yi . Yny )  be  ny  >  0  identically  distributed  random  variables  with 

density  function  g  and  distribution  function  G.  Finally,  let  Z  be  a  random  variable  with  density  function 
fQ,  If  the  Xi's.  the  Yi's,  and  Z  are  mutually  independent,  then  the  conditional  distribution  function  of  Z 
given  that  nx  >  0  =>  Z  >  Xn^mx  ^  ^  ^  —  ^y.ny 

H{t)  =  K  f  fa{z)[F(z)r-[G{z)ry  dz  . 

J -OC 

and  the  conditional  density  function  is 

h[t)  =  Kfn(t)[F(t)y''^[G{t)r-  , 
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where 


K 


A 


1 


Proof:  Suppose  first  that  nx  >  0  and  ny  >  0.  The  joint  density  function  of  Z  and  the  order  statistics 

Xi:nx _ _  Xnx:nx  ^i:ny  ?  •  •  •  ->  ^y:ny  Hiay  be  found  by  transformation  of  variables  [41]  and  their  mutual 

independence  to  be 


:n  v;l,---,7iy  :ny  ^  ^nx  '  2/l  ^  •  '  Vny  ^  -2^)  — 


n/(a:r) 

r=l 


ny 


!  115(2/3) 


8  =  1 


hiz) 


The  joint  density  function  of  Xnxmx^  Ynyiny^  and  Z  is  found  by  ‘'integrating  out”  the  remaining  variables, 
where  the  limits  of  integration  are  determined  by  the  definitions  of  the  order  statistics: 


h'rix  iTi  Y  :n^'  :ny  ;2:  (xnx-yny-z)  =  nxlnY\fix„^)g(yny)fn{z) 


/^nx  /*®2 

’  f  f  (^1 )  *  '  '  /(^n.v  —1 )  '  '  *  ^^nx  —1 

-oc  J  — oc 


rV2 


=  nx^-ny\f{Xnx)9{yny)fQ(z) 


J 

f[G(2/nv)r-n 

.  (nx  -  1)! 

(ray  -  1)! 

=  nxnyf{xnx)g(yny)fQ(z)[F{xnxT''  ^[Giy-nyT^' 


ny  ”1 


The  coiiclitioiial  distribution  function  is  thus 


H{t)  =  Pr[^  <t\z>  A  Z  >  Yny.ny] 

Pv[Z  <tAZ>  A  Z  >  rnv  :nv] 

Pr[Z  >  Xny.nx  f\  Z  ^  Pnymy] 

/-.c  J-oc  Iloc’^n.y.ny,ny:ny-A^^.y,z)dxdydz 

iZc  f-oc  f-oc  Vv:«.v;«y:nv;z(a:,  V,  z)  dx  dy  dz 

dz 

Sr^Mz)[Fiz)MG{z)\-ydz  • 


Now  suppose  that  nx  >  0  and  ny  =  0.  Then  the  joint  density  function  of  Xnx-.nx  and  Z  is 


^nx  :n,v:2  (Xn,v  ' 


nxnxnx)Uz)[F(xnx)r^'^  ^ 


and  the  conditional  distribution  function  is 

.  _  /-cc  f-x  ^nx  nx-A^-^)dxdz 

^  ~  ir^r-^hnx:nx-A^-Z)dxdz 

fLfMn^T’^dz 

fr^Mz)[Fiz)]n-^dz 

/!^^(^)[^’(2)r"[G(2)rd2 

IT^Mz)[F{z)MG(z)]-y  dz 


Likewise,  for  nx  =  0  and  ny  >  0  the  joint  density  function  is 


hny.ny.ziVnyZ)  =  riygiVny)  fn{z)[G(yny)Y'^'  ^ 


and  the  same  conditional  distribution  function  is  obtained.  Finally,  for  nx  =  ny  =  0.  the  density  function 
is  just  fci,  and  again  the  same  conditional  distribution  function  is  obtained.  It  remains  only  to  note  that  the 
conditional  density  function  h  is  the  unique  function  satisfying  H(t)  =  h{s)  ds.  ■ 
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5.L2  Decision  Making  With  Trivial  Thresholding,  Define  the  random  variables  X  =  $c(a)  and 
Y  =  where  a  ^  A  and  h  E  B  are  randomly  chosen  individuals.  The  distribution  functions  F  and  G  of 

X  and  Y  respectively  are  called  the  fitness  distribution  functions  or  simply  fitness  distributions.  When  they 
exist,  the  corresponding  density  functions  /  and  g  are  called  the  fitness  density  functions  or  fitness  densities. 
To  each  of  these  definitions  corresponds  an  obvious  ''conditional”  counterpart. 

One  type  of  conditional  fitness  distribution  of  interest  is  that  for  an  individual  with  a  given  number 
of  ancestors  belonging  to  a  particular  class,  since  these  are  the  fitness  distributions  of  individuals  surviv¬ 
ing  selection.  Informally,  an  individual’s  ancestors  are  those  individuals  against  which  it  competes,  either 
explicitly  or  implicitly.  This  is  made  precise  by  the  following  definition. 

Definition  5.1.3  (Ancestor);  Let  a  G  P(t).  //t  =  0  and  a  G  A  then  (in  the  population  P{0))  a  possesses 
one  ancestor  in  A  (itself).  If  t  =  0  and  a  ^  A  then  (in  the  population  P(0))  a  possesses  no  ancestors  in  A. 
For  t  >  0; 

•  If  in  — 1)  a  possesses  n  ancestors  in  A  and  it  is  selected  in  a  tournament  in  which  no  probabilistically 
O’Compatible  second  individual  is  found  then  (in  the  population  P{t))  a  possesses  n  ancestors  in  A  (the 
ancestors  of  a  in  P{t  —  1)^. 

•  If  in  P{t  —  1)  a  possesses  m  ancestors  in  A.  where  0  <  m  <  n.  and  it  is  selected  in  a  tournament  in 
which  the  second  individual  b  possesses  n  —  m  ancestors  in  A  then  (in  the  population  P{t))  a  possesses 
n  ancestors  in  A  (the  ancestors  of  a  in  P{t—  1)  together  with  those  ofh  in  P{t—  1)). 

□ 

Because  the  more  fit  individual  wins  each  tournament  with  probability  L  every  individual  a  G  P{t)  is 
at  least  as  fit  as  each  of  its  ancestors.  This  straightforward  observation  leads  to  the  following  key  element 
of  the  decision  making  model,  which  relates  the  conditional  fitness  distribution  of  an  individual  to  the 
(unconditional)  fitness  distributions  of  its  ancestors. 
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Corollary  5.1.4  Let  a  €  A  C  /,  anrf  let  B  C  I  such  that  Ar\B  =  {}.  Then  the  conditional  fitness 
distribution  of  a.  given  that 

•  a  has  Tlx  >  1  ancestors  in  A  (including  itself),  has  ny  >  0  a,ncestors  in  B,  and  has  no  other  ancestors; 

•  those  ancestors  in  A  have  fitness  density  f  and  fitness  distribution  F: 

•  those  ancestors  in  B  have  fitness  density  g  and  fitness  distribution  G:  and 

•  the  fitnesses  are  mutually  independent 
is 

Hit)  K  f  /(x)[F(i)]"'-i[G(x)r>'ci,T  . 

J  —  yo 

and  the  conditional  density  is 

hit)  =  Kfit)[Fit)r'^-^[Git)ry  , 

where 

^  _ 

r^/(2:)[F(x)]".v-i[G(x)]"v  dx  • 


Proof:  Let  Z  be  the  fitness  of  a,  (Xi _ .Xnv-i)  the  fitnesses  of  the  remaining  ancestors  in  A  (if  any), 

and  {Yi . Yny  )  the  fitnesses  of  those  ancestors  in  B  (if  any).  Because  a  is  at  least  as  fit  as  every  ancestor, 

it  is  in  particular  at  least  as  fit  as  any  ancestors  in  A,  Hence,  >  0  ==>  Z  >  a  is  also 

at  least  as  fit  as  those  ancestors  in  B.  so  ny  >  0  =>  Z  >  YnymY'  The  conclusion  follows  immediately  from 
Theorem  5.1.2.  ■ 
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The  next  theorem  states  the  probability  of  correct  decision  making  as  a  function  of  the  number  of 
ancestors  belonging  to  each  class  for  each  of  the  competing  individuals.  Without  loss  of  generality,  this 
research  defines  a  “correct"  decision  to  be  one  in  which  the  individual  in  A  is  more  fit  than  the  one  in  B. 

Theorem  5.1.5  Let  &  e  A  C  I  andh  €  B  C  I,  where  An  B  =  {}.  Also,  suppose  that 

1.  a  has  71^  >  1  ancestors  in  A,  has  ny^  >  0  ancestors  in  B,  and  has  no  other  ancestors; 

2.  b  has  >  0  ancestors  in  A,  has  ^  1  ancestors  in  B.  and  has  no  other  ancestors; 

3.  those  ancestors  in  A  (of  either  a  or  h)  have  fitness  density  f  and  fitness  distribution  F; 

4-  those  ancestors  in  B  (of  either  a  or  bj  have  fitness  density  g  and  fitness  distribution  G;  and 
5.  the  fitnesses  are  mutually  independent. 

Then  the  probability  that  a  is  the  more  fit  individual  is 

Ki^)K(h)  r /(t)[F(t)]nk-’-i[G(t)]"v’  f  9{sms)]^"\Gisrr'-Usdt  , 

J-r)0  J~OC 

where 


and 


/r^/(x)[B(xr.v'-HG(a 


dx 


A  _ 1 _ 

Proof:  Let  Ha  be  the  conditional  fitness  density  of  a  given  conditions  1.  3.  4.  and  5.  Also,  let  Hb  be  the 
conditional  fitness  distribution  of  b  given  conditions  2.  3,  4,  and  5.  Then  the  probability  that  a  is  more  fit 
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than  b  is  just  hA{t)HB{t)  dt.  By  Corollary  5.1.4. 


hA{t)  = 


and 

=  f  ff(s)[G(5)]"v’-i[F(s)Kv'ds  . 

J -OC 

The  result  follows  immediately.  I 

5.1.3  Decision  Making  with  Nontrivial  Thresholding.  This  section  extends  the  results  of  Sec¬ 
tion  5.1.2  to  consider  the  effect  of  thresholding  on  the  probability  of  correct  decision  making.  Define  the 
set  fii'D)  of  those  pairs  of  individuals  (a.b)  for  which  a  belongs  to  A.  b  belongs  to  B,  and  the  individual 
similarity  is  D.  That  is, 

n(D)  =  {(a.b)  eAxB:  d((a.h)  =  D}  , 

where  d  is  the  individual  similarity,  and  it  is  understood  that  A.B  C  I  with  ^  0  B  =  {}. 

Corollairy  5.1.6  Let  a  G  A  C  J,  and  let  B  C  I  such  that  A  f)  B  =  {}.  Then  the  conditional  fitness 
distribution  of  3.  given  that 

•  a  has  nx  >  1  ancestors  in  A  (including  itself),  has  ny  >  0  ancestors  in  B.  has  no  other  ancestors, 
and  has  fitness  density  f^; 

•  those  ancestors  in  A.  excluding  a  itself,  have  fitness  density  f  and  fitness  distribution  F; 

•  those  ancestors  in  B  have  fitness  density  g  and  fitness  distribution  G;  and 

•  the  fitnesses  are  mutually  independent 
is 


H{t)  =  K  f  fu{x)[F(xT’^-^[G{x)]^'- dx  . 

J  — OO 
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and  the  conditional  density  is 


h{t)  =  Kfu{tmt)r’^~^[G{t)y‘^-  . 


where 


K 


A 


1 

/"^/n(a:)[F(x)]".v-i[(?(x)]nvda: 


Proof:  Let  Z  be  the  fitness  of  a,  (Xi,. . .  ^he  fitnesses  of  the  remaining  ancestors  in  A  (if  any). 

and  (Yi . Yny)  the  fitnesses  of  those  ancestors  in  B  (if  any).  Because  a  is  at  least  as  fit  as  every  ancestor. 

it  is  in  particular  at  least  as  fit  as  any  ancestors  in  A.  Hence,  >  0  Z  >  Xn^-Unx-i'  ^  is  also 
at  least  as  fit  as  those  ancestors  in  so  ny  >  0  =>  Z  >  Yny^ny-  The  conclusion  follows  immediately  from 
Theorem  5.1.2.  | 

The  next  theorem  states  the  probability  of  correct  decision  making  as  a  function  of  the  number  of 
ancestors  belonging  to  each  class  for  each  of  the  competing  individuals  and  the  similarity  of  the  competing 
individuals. 

Theorem  5.1.7  Let  (a.b)  €  Ct{D),  Also,  suppose  that 

1.  a  has  >  1  ancestors  in  Aj  has  >  0  ancestors  in  B.  has  no  other  ancestors,  and  has  fitness 
density  Jq: 

2.  b  has  71^^  >  0  ancestors  in  A,  has  Uy^  >  1  ancestors  in  B.  has  no  other  ancestors,  and  has  fitness 
density  qq: 

S.  those  ancestors  in  A  (of  either  a  or  b,  but  excluding  a  itself)  have  fitness  density  f  and  fitness  distri¬ 
bution  F; 
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4.  those  ancestors  in  B  (of  either  a  or  h,  but  excluding  b  itself)  ho.ve  fiiness  density  g  and  fitness  distri¬ 
bution  G:  and 

5,  the  fitnesses  are  mutually  independent. 

Then  the  probability  that  a  is  the  more  fit  individual  is 


Pdin^x  '  ^  ^ )  = 


/„(t)[ir(t)]"!v’-i[G{<)]"v’  f  gu{sms)f^\G{sry'-Usdt 


/: 


where 


J":^fa(x)[F{x)fx-^G(x)]<"dx 

and 

j^(b)  ^  _ 

Proof:  Let  Ha  be  the  conditional  fitness  density  of  a  given  conditions  1.  3.  4.  and  5.  Also,  let  Hb  be  the 
conditional  fitness  distribution  of  b  given  conditions  2,  3,  4.  and  5.  Then  the  probability  that  a  is  more  fit 
than  b  is  just  hA(t)HB{t)dt.  By  Corollary  5.1.4, 

hAit)  = 


and 

=  r  gn(s)[Gis)ry’-^F(srx' ds  . 

J  -  oc 

The  result  follows  immediately. 
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5,2  Dynamical  Systems  Model 


In  this  section,  the  expected  distribution  of  individuals  between  competing  classes  after  selection  is 
obtained.  Specifically,  the  section  investigates  the  effects  of  binary  tournament  selection  (BTS)  with  proba¬ 
bilistic  thresholding  on  the  distribution  of  individuals  belonging  to  classes  A  and  B  where  AH  B  =  {}  and 
A  U  j5  =  /. 

As  in  the  analysis  of  building  block  filtering,  the  population  is  modeled  via  a  population  vector  based 
on  that  proposed  by  Liepins  and  Vose  [49],  The  model  is  more  general  than  that  of  Liepins  and  Vose  in  that 
the  components  of  the  population  vector  define  a  density  function  over  a  set  of  equivalence  classes  which 
form  a  partition  of  the  augmented  individual  space  7  =  /  x  N  x  N, 

Specifically,  the  population  vector  is  of  the  form 


where 


and 


A 

P  = 


(12) 


.(A)  A 


5(^00)  ...  rt(Aoj) 


pi-^io)  ,  ,  ,  p(Aij) 


^(B)  A 


pv^oo;  ...  pK. 


p(B.o)  ...  p{B,j) 
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Each  component  (resp.  is  the  probability  that  an  individual  sampled  from  the  population 

belongs  to  the  equivalence  class  Aij  C  1  (resp.  Bij  C  /)  consisting  of  those  individuals  which  are  in  A  (resp. 
5),  have  i  ancestors  in  A,  and  have  j  ancestors  in  B. 

The  effect  of  tournament  selection  on  the  population  is  modeled  as  a  deterministic  tvansition  function 
Ts  mapping  the  current  population  vector  p  to  the  expected  next  population  vector  rs{p).  By  the  law  of 
large  numbers,  the  next  population  vector  in  an  infinite  population  algorithm  exactly  matches  the  expected 
population  vector.  Thus,  the  model  developed  here  is  exact  for  such  algorithms. 

Furthermore,  as  demonstrated  by  Vose  and  Wright  [74],  the  transition  function  r,  is  independent 
of  population  size  p,  assuming  that  the  next  population  results  from  p  independent,  identically  distributed 
choices.  For  binary  tournament  selection,  this  assumption  holds  when  the  competing  individuals  are  selected 
with  replacement,  regardless  of  population  size.  Hence,  the  model  is  also  an  exact  expected  value  model  for 
such  algorithms. 

Finally,  for  trivial  thresholding  (i.e.  when  all  individuals  are  ^-compatible  with  probability  1).  the  ex¬ 
pected  population  is  independent  of  whether  individuals  are  selected  with  or  without  replacement.  Therefore, 
the  model  is  also  an  exact  expected  value  model  for  these  algorithms. 

The  model  is  not  necessarily  exact  for  finite  population  size  algorithms  with  nontrivial  thresholding  in 
which  individuals  are  selected  without  replacement.  Consider  the  case  in  which  the  most  fit  individual  is  6- 
compatible  with  probability  1  with  at  least  three  other  individuals,  all  other  individuals  are  probabilistically 
^-compatible  with  probability  0  with  each  other,  and  the  shuffle  size  is  very  large.  If  individuals  are  selected 
with  replacement,  then  the  best  individual  is  expected  to  compete  in  and  win  at  least  three  tournaments. 
On  the  other  hand,  if  individuals  are  selected  without  replacement,  then  the  best  individual  is  expected 
to  compete  in  only  two  tournaments.  Thus,  the  expected  population  under  selection  without  replacement 
differs  from  the  expected  population  under  selection  with  replacement,  which  is  predicted  exactly  by  the 
proposed  model. 


93 


In  summary,  the  proposed  model  is  exact  (at  least  in  expectation)  for  algorithms  having  any  of  the 
following  properties: 

•  infinite  population  size, 

•  competing  individuals  are  selected  with  replacement,  or 

•  trivial  thresholding. 

Each  tournament  conducted  by  BTS  with  probabilistic  thresholding  may  be  viewed  as  a  Markov  chain,  which 
this  section  analyzes  at  successively  decreasing  levels  of  abstraction  (increasing  levels  of  detail).  The  more 
abstract  Markov  chains  (Section  5,2,1)  view  each  tournament  as  a  sequence  of  two  state  transitions  —  one 
corresponding  to  selection  of  the  first  individual  the  other  to  the  selection  of  the  second  individual  and 
determination  of  the  winner.  The  less  abstract  (more  detailed)  Markov  chains  (Section  5,2.2)  focus  on  the 
state  transitions  required  to  search  for  a  ^-compatible  second  individual  before  a  winner  can  be  determined. 

5.2.1  High-level  Markov  Chain.  The  most  abstract  view  of  a  binary  tournament  (of  those  considered 
here),  which  is  called  MC-0.  is  represented  as  a  state  transition  diagram  in  Figure  20.  The  five  states  of 
MC-0  are: 

•  sq:  Initial  state. 

•  The  first  individual  is  in  A.^ 

•  The  first  individual  is  in  B. 

•  sa:  The  winner  is  in  A. 

•  sb'  The  winner  is  in  B. 

The  transition  probability  from  Sq  to  is  the  probability  =  '^i  j  l^hat  an  individual 

drawn  from  the  current  population  P(t)  is  in  A.  For  the  initial  iteration  of  selection  (i  =  0),  this  probability 

^Throughout  this  section,  single  superscripts  refer  to  the  first  individual.  The  first  (second)  element  of  an  ordered  pair 
superscript  refers  to  the  first  (second)  individual.  Finally,  subscripts  refer  to  the  tournament  winner. 
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Figure  20.  MC-0:  High-level  Markov  Chain  Model  of  a  Binary  Tournament. 

is  determined  entirely  by  the  distribution  from  which  the  initial  population  P(0)  is  drawn  (Section  5.3). 
Thereafter,  it  is  determined  by  the  following  recurrence  relation. 

Theorem  5.2.1  Let  be  the  conditional  probability  that  the  winner  of  a  tournament  in  generation  t 

is  in  A  given  that  the  first  individual  is  in  A.  Also,  let  a^^\t)  be  the  conditional  probability  that  the  winner 
of  a  tournament  in  generation  t  is  in  A  given  that  the  first  individual  is  in  B.  Finally,  let  a  be  randomly 
drawn  from  the  population  P(t  -t-  1).  where  t  >  0.  Then  the  probability  that  a  £  A  is 

+ 1)  =  [„'/'(()  -  „“>'(,)]  +  . 

Proof:  The  probability  that  an  individual  drawn  from  population  P{t  +  1)  is  in  A  is  just  the  probability 
that  the  winner  of  a  tournament  in  generation  t  is  in  A.  The  latter  is  the  absorption  probability  from  so 
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into  ill  generation  t,  so  that 


{t  4- 1) 


Due  to  symmetry,  an  analogous  relation  holds  for 

The  transition  probability  is  the  conditional  probability  that  the  winner  of  a  tournament  in 

generation  Hs  in  >1  given  that  the  first  individual  is  in  A,  and  is  the  corresponding  probability  given 

that  the  first  individual  is  in  B.  Both  depend  on  the  probabilities  of  correct  decision  making,  and  thus  on 
the  number  ua  of  the  first  individuars  ancestors  in  A  and  the  number  in  5  (see  Section  5.1).  Thus,  it 
is  necessary  to  consider  the  less  abstract  (more  refined)  Markov  chain  MC-1  which  explicitly  depicts  these 
dependencies  (see  Figure  21).  The  states  of  MC-1  are: 

•  Sq:  Initial  state. 

•  >  I^ub  ^  0.n>i  4-  ub  ^  2*:  The  first  individual  is  in  A.  has  ua  ancestors  in  A.  and  has 
tib  ancestors  in  B. 

•  s^^'^AnB\nA  >  0,nB  >  I.ua  +  <  2^  The  first  individual  is  in  J5.  has  ua  ancestors  in  A,  and  has 

ub  ancestors  in  B, 

•  SAi^.i  >  l,i  >  0,i  4”i  <  2^+^:  The  winner  is  in  A,  has  i  ancestors  in  A,  and  has  j  ancestors  in  B. 

•  SBij>i  >  0,  j  >  1,^  4-i  <  2^+^:  The  winner  is  in  B,  has  i  ancestors  in  A,  and  has  j  ancestors  in  B. 

As  previously  mentioned,  the  transition  probability  from  sq  to  s^^-a-b^  is  the  probability  that 

an  individual  drawn  from  the  population  P(t)  is  in  A,  has  ua  ancestors  in  A.  and  has  ub  ancestors  in  B. 
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Figure  21.  MC-1:  High-level  Markov  chain  model  of  a  binary  tournament  explicitly  depicting  the  depen¬ 

dence  of  the  transition  probabilities  on  the  first  individual's  ancestors. 

Like  it  satisfies  a  recurrence  relation,  given  in  the  following  theorem.  Again,  the  initial  condition  is 

determined  by  the  distribution  from  which  the  initial  population  P(0)  is  drawn. 

Theorem  5.2.2  Let  p(^'(0)  he  the  probability  that  an  individual  drawn  from  the  initial  population  P(0)  is 
in  A.  Also,  let  a  be  drawn  randomly  from  P(0).  Then  the  probability  that  a  e  Aij  is 
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the  winner  of  a  tournament  in  genera, tion  t  is  in  Aij  given  that  the  first  individu.al  is  in  Bn^na  Finally,  let 
a  be  drawn  randomly  from  P(t  -f  1),  where  t  >  0.  Then  the  probability  that  a  €  Aij  is 

riA  riB 

Proof:  Every  individual  in  the  initial  population  P(0)  has  exactly  one  ancestor  (itself).  Also,  the  proba¬ 
bility  that  an  individual  drawn  from  population  P(t  +  1)  is  in  Aij  is  just  the  probability  that  the  winner  of  a 
tournament  in  generation  t  is  in  Aij.  The  latter  is  the  absorption  probability  from  sq  into  in  generation 

t.  m 

Again,  due  to  symmetry,  an  analogous  result  holds  for  the 

5.2.2  Low-level  Markov  Chain.  The  transition  from  to  5^.^.  or  ssij  in  MC-1  involves  the 

intermediate  steps  required  to  search  for  a  l9-compatible  second  individual.  These  steps  form  a  Markov  chain 
MC-2  for  which  some  of  the  transition  probabilities  depend  on  and  Figure  22  shows  MC-2,  which 
may  be  viewed  as  a  fragment  of  a  still  more  refined  Markov  chain  model  of  the  overall  tournament.  The 
states  of  MC-2  are: 

•  Initial  state.  The  first  individual  is  in  An^ne-  No  candidate  second  individuals  have  been 
considered. 

{An  n  ) 

•  Sr  ^  ^  ^  n-a/i*  The  first  individual  is  in  An^ne-  Furthermore,  r  second  individuals  have  been 

considered  and  probabilistically  found  not  to  be  ^-compatible. 

•  5A^j(and  SAn^nB  ~  ^  +  i  <  2^  The  winner  is  in  Aij  (resp.  An^ns)* 

•  ^  ^  +  i  ^  2*:  The  winner  is  in  Bij. 

The  transition  probability  (t)  in  MC-1  is  the  conditional  probability  that  the  winner  of  a  tournament 

in  generation  t  is  in  Aij  given  that  the  first  individual  is  in  The  following  theorem  provides  an 

expression  for  (t)  in  terms  of  the  transition  probabilities  of  MC-2. 
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Figure  22.  MC~2:  Markov  chain  model  of  the  search  for  a  probabilistically  ^-compatible  second  individual 

and  the  determination  of  the  tournament  winner.  The  first  individual  is  in  A,  has  ua  ancestors 
which  are  in  A.  and  has  ub  ancestors  in  B.  Dependence  of  the  transition  probabilities  on  the 
iteration  t  is  notationally  suppressed  for  visual  clarity. 

Theorem  5.2.3  Swppose  that 


^(An  n  ) 

•  ^Aij  ^  ^  conditional  probability  that  a  candidate  second  individual  in  generation  t  is  proba¬ 
bilistically  d -compatible  and  the  tournament  winner  is  in  Aij  given  that  the  first  individual  is  in  . 

•  is  the  conditional  probability  that  a  candidate  second  individual  in  generation  t  is  not  probabilis¬ 
tically  9-compatible  with  the  first  individual  given  that  the  first  individual  is  in  An^ng!  o.nd 

•  n^h  Is  the  shuffle  size. 
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Then  the  conditional  probability  that  the  winner  of  a  tournam.ent  in  generation  t  is  in  Aij  given  that  the  first 


individual  is  in  Anj^riB 


a 


n»h 


r=0 


where  6ij  is  the  Kronecker  delta. 

Proof:  The  transition  from  state  to  state  in  MC-1  (see  Figure  21)  is  equivalent  to  the 

corresponding  absorption  event  in  MC-2  (see  Figure  22).  Thus,  the  transition  probability  is 

equal  to  the  probability  of  the  absorption  event,  i.e. 


(A„ 


Oi, 


'\t) 


1  +  Sinjjns  •  1 

r=0  ^ 

riah-l 

r—0 


The  next  theorem  provides  a  similar  result  for  the  MC-1  transition  probability  which  is  the 

conditional  probability  that  the  winner  of  a  tournament  in  generation  t  is  in  Bij.  again  given  that  the  first 
individual  is  in  An^^ns- 
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Theorem  5.2,4  Suppose  that 


•  (0  conditional  probability  that  a  candidate  second  individual  in  genera, tion  t  is  proba¬ 

bilistically  0-compatible  and  the  tournament  winner  is  in  Bij  given  that  the  first  individual  is  in  An^ns  ’ 
and 

•  ^  is  the  conditional  probability  that  a  candidate  second  individual  is  no t probabilistically  0 -compatible 
given  that  the  first  individual  is  in  An^ns^ 

Then  the  conditional  probability  that  the  winner  of  a  tournam,ent  in  generation  t  is  in  Bij  given  that 
the  first  individual  is  in  is 


a 


(^n 


’(0  E 


r=:0 


Foi  each  r  £  —  1},  the  transition  from  Sr  ''  ^  to  sa^j-  sb^j-  or  involves  sampling  the 

population  to  choose  a  candidate  second  individual,  determination  of  compatibility,  and  (possibly)  determi- 
nation  of  the  more  fit  individual.  Furthermore,  the  probability  of  correct  decision  making  depends  on  the 
thresholding  distance  D  (see  Section  5.1). 

Thus,  it  is  necessary  to  consider  another  ‘lower-level"  (more  refined)  Markov  chain  MC-3  in  which 
these  steps  and  dependencies  appear  explicitly.  For  r  <  Ush  the  transition  probabilities  do  not  depend  on 
r.  Thus  it  is  sufficient  to  consider  a  representative  fragment  for  which  the  first  individual  is  in  An^ns 
candidate  second  individuals  have  been  considered  and  found  not  to  be  probabilistically  ^-compatible.  MC-3 
is  shown  in  Figure  23.  Its  states  are: 

•  Sr  ^  :  Initial  state.  The  first  individual  is  in  An^^riB^  ^  candidate  second  individuals  have  been 
considered  and  found  not  to  be  probabilistically  ^-compatible. 
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Figure  23.  MC-3:  Fragment  of  "low-level"  Markov  chain  model  of  selection  of  the  second  individual  in 

BTS  with  thresholding  and  finite  shuffle  size.  The  first  individual  is  in  A,  has  ua  ancestors  in 
A,  and  has  ub  ancestors  in  B,  Also,  r  candidate  second  individuals  have  been  found  not  to  be 
probabilistically  ^-compatible. 
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*  ®r.i  ^  ^  The  conditions  of  Sr  hold.  Furthermore,  the  candidate  second 

individual  has  individual  similarity  D  from  the  first  individual  and  is  in  A  -  (i-e-  is  in  A 

and  either  1)  has  other  than  i  -  ua  ancestors  in  A,  or  2)  has  other  than  j  -  ub  ancestors  in  B). 

*  ®r.2  ^  B)  <  Dai&x'  The  conditions  of  Sr  hold.  Furthermore,  the  candidate  second 

individual  has  individual  similarity  D  from  the  first  individual  and  is  in  B  - 

*  ^r.3  B  K  i?max‘  The  Conditions  of  Sr  ^  hold.  Furthermore,  the  candidate  second 

individual  has  individual  similarity  D  from  the  first  individual  and  is  in 

*  ®r.4  ^  B  <  Bniax-  The  Conditions  of  St  ''  ®  hold.  Furthermore,  the  candidate  second 

individual  has  individual  similarity  D  from  the  first  individual  and  is  in  ,■  „ 

*  ^r,5  ^  ^  -C^max*  The  Conditions  of  state  5^,^^  [D)  hold.  Furthermore,  the  individuals 

are  probabilistically  ^-compatible,  hence  a  ‘’decision''  must  be  made. 


•  s 


r+l 


(depicted  twice):  The  candidate  second  individual  is  not  probabilistically  ^-compatible.  This 


state  coincides  with  the  ‘‘next**  state  of  MC-2  (Figure  22). 

•  The  tournament  winner  is  in  Aij. 

•  The  tournament  winner  is  in  Bij. 

•  sp  (depicted  twice):  Represents  the  union  of  all  MC-2  final  states  not  explicitly  depicted,  i.e. 


Sf=  IJ  • 


The  transition  probability  in  MC-2  is  the  conditional  probability  that  a  candidate  second 

individual  in  generation  t  is  probabilistically  6'-compatible  and  the  tournament  winner  is  in  Bij  given  that 
the  first  individual  is  in  The  following  theorem  provides  an  expression  for  in  terms  of 

the  transition  probabilities  of  MC-3. 
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Theorem  5.2.5  Suppose  that 


•  conditional  probability  that  an  individual  drawn  from,  the  current  population 
has  individual  similarity  D  from  the  first  individual  and  is  in  Ai-riAj-riB  ?  given  the  conditions  of  state 

•  ^  conditional  probability  that  an  individual  drawn  from  the  current  population 

has  individual  similarity  D  from  the  first  individual  and  is  in  ;  y^ven  the  conditions  of  state 

( , 

Sr  • 

•  is  the  conditional  probability  that  the  individuals  are  probabilistically  0 -compatible  given  given 
the  conditions  of  state 

•  conditional  probability  that  the  individuals  are  probabilistically  B-compaiible  given  the 

( A„  n  ) 

conditions  of  state  (D);  and 

•  Pd{nA-nB,i  -  riA-j  -  ns.D)  is  the  conditional  probability  that  the  first  individual  is  more  fit  (see 

An  n 

Section  5.1)  given  the  conditions  of  state  (D). 

Then  the  conditional  probability '  in  generation  t  that  a  candidate  second  individual  is  probabilistically  B- 
compatible  and  the  tournament  winner  is  in  Aij  given  that  the  first  individual  in  is  An.^nB  'is 


.(An, 


’u)  =  E 


(A 


(A,B) 


PdiuA^riB^i  -  nA^  j  - 


Proof:  The  transition  event  from  to  state  SAij  in  MC-2  (see  Figure  22)  is  equivalent  to  the 

corresponding  absorption  event  in  MC-3  (see  Figure  23).  Thus,  the  transition  probability  ^  (t)  is 

equal  to  the  probability  of  the  absorption  event.  * 

(Ann) 

The  next  theorem  provides  a  similar  result  for  the  MC— 2  transition  probability  ^  ^  which  is  the 
conditional  probability  that  a  candidate  second  individual  in  generation  t  is  probabilistically  ^-compatible 
and  the  tournament  winner  is  in  J5ij,  again  given  that  the  first  individual  is  in  An^nB- 
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Theorem  5.2.6  Suppose  that 


•  conditional  probability  that  an  individual  drawn  from,  the  current  population  has 

individual  similarity  D  from  the  first  individual  and  is  in  Bi^riA.j-nB  i  that  the  first  individual  is 

G>'f^‘dr  candidate  second  individuals  have  been  considered  and  found  not  to  be  probabilistically 
0 -compatible; 

•  conditional  probability  that  the  individuals  are  probabilistically  0- compatible  given  the 
conditions  of  state 

•  Pd{nA*'^B-'i  “  is  the  conditional  probability  that  the  first  individual  is  more  fit  (see 

Section  5.1)  given  the  conditions  of  state  (D). 

Then  the  conditional  probability  in  generation  t  that  a  candidate  second  individual  is  probabilistically  0- 
compatible  and  the  tournament  winner  is  in  Bij  given  that  the  first  individual  in  is  An^^ns 


-n^,j  -  nB,D)]  . 

D 

(A  ) 

Proof:  The  transition  event  from  to  state  sb,^  in  MC-2  (see  Figure  22)  is  equivalent  to  the 

corresponding  absorption  event  in  MC-3  (see  Figure  23).  | 

Finally,  the  following  theorem  provides  the  probability  that  a  candidate  second  individual  is  not  probabilis¬ 
tically  ^-compatible. 


Theorem  5.2.7  The  conditional  probability  that  a  candidate  second  individual  is  not  probabilistically  6- 
compatihle  given  that  the  first  individual  is  in  is 


fA,A)v-^v;^  (A 


{D.  t)  +  c'7'®'  Y.  E  {D,  t) 
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Proof:  The  conditional  probability  that  a  candidate  second  individual  is  not  probabilistically  ^-compatible, 
given  that  the  first  individual  is  in  is  equal  to  the  transition  probability  from  {riA^nB)  to 

This  is  given  by 


c""  =  E  EE  ')  [i  -  4"’'")  +  7“— ■'"(D.i)  [1  -  } 

D  i  j 

=  I'EEENi'-'-"’ 


D  i  j 


-E 

D 

=  '-E 

D 


JA.A)  ' 
-D 


E  E  E  E  ^\-Z7-ns  iD,t) 


(  An  4  n  o ) 


»  3 


I  3 


4"-''’ EEt«  •^■■'"(^.0 +4"'^’ EET.'-;;;7-s.(i>'') 


t  3 


i  J 


which  does  not  depend  on  r.  ■ 

This  section  concludes  with  the  observation  that  Kargupta’s  model  of  BTS  (Section  2. 6. 5. 2)  may  be 
obtained  as  a  special  case  of  the  model  developed  in  this  chapter  by  assuming  that 


•  a  compatible  second  individual  is  found  for  every  tournament, 

•  no  individual  contains  multiple  building  blocks, 

•  every  individual  which  contains  any  building  block  is  more  fit  than  every  individual  which  contains  no 
building  blocks,  and 

•  the  probabilities  of  correct  decision  making  are  static  (i.e.  independent  of  the  generation  t). 


5.3  Distribution  of  Fitnesses  in  a  Uniform  Random  Population 

The  probability  of  correct  decision  making,  and  consequently  the  distribution  of  individuals  between 
competing  classes,  depends  on  the  fitness  distributions  of  the  ancestors  of  the  competing  individuals  (see 
Sections  5.1  and  5.2).  The  fitness  distributions  are  determined  by  the  distribution  from  which  the  initial 
population  P(0)  is  drawn,  as  well  as  the  fitness  function  itself. 
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This  section  presents  the  fitness  distributions  obtained  in  the  case  of  linkage-friendly  genetic  algorithms. 
The  analysis  focuses  on  the  first  two  central  moments  (i.e.  the  expected  values  and  variances)  of  the 
fitness  distributions  resulting  when  then  fitness  function  is  separable  (see  Section  5.3.1).  The  competing 
classes  under  consideration  are  the  class  Ip  of  individuals  containing  building  block  0  and  the  class  I-.p  of 
individuals  lacking  building  block  0.  These  classes  are  formally  defined  in  Section  5.3.1,  along  with  several 
other  classes  which  appear  frequently  in  the  analysis.  In  each  case,  two  distributions  are  considered.  The 
first  is  the  (unconditional)  fitness  distribution  of  the  class  (Section  5.3.2).  The  other  is  the  conditional  fitness 
distribution  given  also  that  the  individual  is  a  member  of  a  pair  of  individuals  drawn  randomly  from  the  set 
of  pairs  having  individual  similarity  D  (Section  5.3.3). 

5.S.1  Preliminaries.  As  discussed  in  Section  2.6,  the  class  of  linkage-friendly  genetic  algorithms 
includes  the  fast  messy  genetic  algorithm  (Section  2.6.4).  More  generally,  it  includes  the  generalized  fast 
messy  genetic  algorithm  proposed  in  Chapter  III.  Three  properties  of  these  algorithms  which  are  of  use  in 
the  analysis  of  this  section  are: 

•  the  individual  space  I  consists  of  ordered  pairs  of  finite  sequences,  where  elements  of  one  sequence  are 
alleles,  and  elements  of  the  other  sequence  are  loci  (see  Section  2.6.1); 

•  evaluation  of  individuals  in  which  one  or  more  loci  do  not  occur  relies  on  default  values  specified  by 
the  competitive  template  (see  Section  2.6.1);  and 

•  the  initial  population  is  uniformly  distributed  over  the  set  /(A)  of  length  A  non-overspecified  individuals. 

The  notations  /(A),  Ip,  Ip{X),  and  I^p{\)  for  important  subsets  of  the  individual  space  I  are  introduced 
in  Sections  2.6.1  and  4.1.1.  It  is  convenient  to  introduce  special  notation  for  other  frequently  mentioned 
subsets  of  J,  which  simplifies  the  analysis  presented  in  the  sequel. 

•  For  each  A  €  {0 . i}  and  each  z  €  {1, . . . .  m}.  let  c  €  Ip  and  define 

IxM,  c)  =  {(a,l)  G  /(A)  :  -(VL  G  A)(Vi  G  C)[lj  =  L  aj  =  cl]}  . 
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Then  /;^,(A.c)  is  the  set  of  length  A  individuals  which  disrupt  the  competitive  template  c  with  respecA 
to  subfunction  i, 

•  For  each  A  G  {0 . t)  and  each  i  ^  m},  define 

Then  I-,y..{X,c)  is  the  set  of  length  A  individuals  which  do  not  disrupt  the  competitive  template  c  with 
respect  to  subfunction  i. 

Certain  intersections  of  these  sets  are  also  frequently  mentioned.  These  intersections  are  denoted  by  con¬ 
catenation  of  subscripts.  When  subscripts  refer  to  (possibly)  different  subfunctions,  a  comma  is  inserted  for 
clarity,  e.g.  for  each  A  €  {0 . t}  and  each  *  €  {1, ... ,  ?n}, 

/^._i^^(A,c)=/^(A)n/H(A)n4,(A.c) 

is  the  set  of  length  A  individuals  which  contain  building  block  /?.  lack  building  block  i.  and  disrupt  the 
competitive  template  c  with  respect  to  subfunction  i. 

It  is  also  convenient  to  introduce  special  notation  for  the  cardinalities*  of  these  sets.  In  general,  the 
number  of  individuals  contained  in  a  set  /s(A)  is  denoted  iVs(A),  e.g.  the  number  of  length  A  individuals 
containing  building  block  /3.  lacking  building  block  i.  and  disrupting  the  competitive  template  c  with  respect 
to  subfunction  i  is  IV^,-,i^.(A.c).  Analytical  expressions  for  those  cardinalities  which  appear  in  the  analysis 
are  given  in  Appendix  A. 

Also,  the  set  of  pairs  of  individuals  which  share  some  number  of  defining  loci  appears  frequently  in  the 
sequel.  Let  Ac(xi,X2)  denote  the  number  of  common  defining  loci  of  individuals  Xi  and  X2.  That  is. 

Ac((ai.li).(a2,l2))  =  card({L€£:(3ii.i2€£)[/i,q  =l2.ij  =  L]})  . 

*This  research  considers  only  the  case  of  a  finite  genic  alphabet  A. 
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Also,  let  w(Ai,  A2.  AJ  be  the  set  of  individual  pairs  (X1.X2)  for  which  Xi  contains  building  block  p  and  is  of 
length  Ai.  X2  does  not  contain  building  block  ft  and  is  of  length  A2.  and  Ac(xi,X2)  =  Ac-  That  is. 

w{Ai,A2.Ae)  =  {(X1.X2)  €  X  /^^(A2)  :  Ae(Xi.X2)  =  AJ  . 

The  set  a;(Ai.  A2,  Ac)  is  related  to  the  set  Q,{D)  consisting  of  pairs  of  individuals  in  Ax  B  having  individual 
similarity  D  (see  Section  5.1).  In  particular,  if  A  =  IpiXi).  B  =  I-,p{X2),  and  D  =  d(xi,X2)  =  min{Ai,  A2}  - 
Ac(xi,X2),  then  Q{D)  =  w(Ai,  A2,Ac). 

Finally,  the  fitness  distributions  considered  in  Sections  5.3.2  and  5.3.3  are  those  associated  with  fitness 
functions  which  can  be  written  as  the  sum  of  independent  subfunctions.  The  following  results  are  used 
so  frequently  in  their  analysis  that  they  are  stated  as  lemmas.  The  first  shows  that  certain  interesting 
conditional  variances  vanish. 

Lemma  5.3.1  (Special  conditional  variances  of  subfunction  contributions)  Let  ^  =  Y^i<f>i  :  I  x 
If  — *  R  be  a  separable  fitness  function,  c  €  Ip-  and  X  ~  U{I{X)).  Then  the  conditional  variance  of 
(pi{yi.c)  given  that  X  contains  building  block  i  is  zero.  Also,  the  conditional  variance  of  (f>i{li,c)  given  that 
X  does  not  disrupt  c  with  respect  to  subfunction  i  is  zero.  Finally,  the  conditional  variance  o/(^i(X.c)  given 
that  X  lacks  building  block  i  and  does  not  disrupt  c  with  respect  to  subfunction  i  is  also  zero.  i.e. 

Var  [<f>i(X.c)  I  X  €  /i(A)]  =  Var  [0i(X.c)  |  X  €  [<^i(X.c)  |  X  €  /^i^Xi(A)]  =  0  . 

Proof:  X  e  /i(A)  =:>  4>i{x.c)  -  <j>^.  Thus, 

Var  [<Pi{X.c)  I  X  G  IftX)]  =  S  [{^i(X,c)}2  |  X  G  /^(A)]  -  {S  [^^(X.c)  |  X  G  /i(A)]f  =  -  (4,-f  =  0  . 

Likewise,  x  G  I-,i~,^.{X)  x  G  I-^xtW  ‘^i(x.c)  =  0i(c.c),  which  is  also  independent  of  x.  ■ 
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The  second  lemma  is  a  general  result  of  mathematical  statistics,  although  it  is  not  found  in  many  standard 
references  (e.g.  [2,  41.  53]).  It  is  used  frequently  in  the  sequel  for  obtaining  conditional  variances. 

Lemma  5.3.2  (Decomposition  of  (non-central)  conditional  second  moment)  Let  X  be  a  random 
variable  with  space  S,  f  :  S  — ‘  R.  and  fc  €  R.  Also,  let  A  C  S  such  that  /t/|^  =  £  [/(X)  \  X  £  A]  and 
Var  [f{X)  \  X^A]^£  [{f{X)  -  \  X  €  A]  exist.  Then 

£  [{f{X)  -  kf  \  xe  A]  =  Var  [f{X)  \  X  e  A]  +  -  Jk)^  . 


Proof:  By  the  linearity  of  the  expected  value  operator.  £  [/(X)  -  /ty|^  j  X  e  A]  =  /t/|^  -  =  0,  so 

Var  [nX)\XeA]  +  {pf^^-k)^ 

=  ^  [(/(-^)  ~  M/|a)'  I -V  €  A] -t- 

■“  ^  [(f{^)  ~  f^flA )'  I  V  €  A]  -|-  —  k)  •  £  [/(V)  —  Pf\A  I  €  A]  -h  {pf\A  ~ 

=  ^  [(/(-^)  -  +  2(/(X)  -  Pf\A){l^f\A  -  k)  +  (pf\A  -  I  X  €  A] 

=  ^  [{(/(-X^)  -  M/!.4)  +  (M/I^  “  I  S  A] 

=  £  [{f{X)  -  kf  i  X  €  A]  . 


5.3.2  Central  Moments  of  Unconditional  Fitness  Distributions.  In  the  following  theorem,  decom¬ 
positions  of  the  first  two  central  moments  of  a  particular  subfunction's  contribution  to  an  individual’s  fitness 
are  obtained.  For  a  subfunction  i.  each  quantity  is  expressed  as  a  linear  combination  of  three  appropriate 
conditional  expectations:  that  given  that  the  individual  contains  building  block  i,  that  given  that  the  indi¬ 
vidual  does  not  disrupt  the  competitive  template  with  respect  to  subfunction  i.  and  that  given  that  neither  of 


no 


these  conditions  holds.  These  decompositions  are  useful  in  the  analysis  of  the  fitness  distributions  associated 
with  both  types  of  individuals. 


Theorem  5.3.3  (Subfunction  contribution  expectations)  Let^  I  xip R  be  a  separable 

fitness  function,  c&Ip,  and  X^U(S)  where  S  C  /(A).  Then  the  expected  value  of  d>i{X,c)  is 


where 


IJ,i(S,c)  —  card{S)  ^  ^  •  card{S  0  Ii(X)) 

+  ■  card(S  n 

+  ^i(c,c)  •  car(f(5n/-,,-^^.(A,c)) 


(13) 


l^i  (<5, c)  —  S  [i^,(X, c)  I  X  €  5  n  J-,j^.(A,c)] 


(14) 


Also,  the  variance  of  <f>i(X,c)  is 


+  (<T|-(5,c)  +  [/rr(5,c)-Mi(<S.c)]2^ 

+  [<?ii(c.c)  - /ri(5,c)]2 


where 


card{Snli{X))  (15) 

card{S  <^I^ix.{X,c)) 
car(f(5n7^i^;^.(A,c))  |  . 


a?  (5,c)  =  Far  [(;ii(X,c)  j  X  e  5n/^i;^,.(A,c)] 


(16) 
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Proof:  By  definition,  the  expected  fitness  contribution  of  subfunction  i 


£[<l>i(X.c)]  =  ^  <^i(x,c)-Pr[X  =  x]  . 

x€/(A) 


For  X  ^  U{S).  this  may  be  written  as 


=  card(<S) 

{ 

xG5n/i(A) 

+ 

x^sm^iy^-iXsc) 

+ 

^  i 

x6Sn/,i^^,{A.c) 

=  card(«S)~^  | 

4>l 

•  card  (<S  n /{(A)) 

+ 

e  [0i(x.c)  \  X€Sni^i^,{x,c)] 

•  card  (5  R  c)) 

+ 

<l>i(c,c) 

•  card  {S  n  /-,j-,^.(A,c)) 

=  fiiiS.c)  , 


where  we  have  used  the  facts  that 


X  E  5  n  Ii(X)  (j)i{x.  c)  =  (f>i 


and 


X  E  5  n c)  (f)i{x,c)  =  (j)i{c.c).  . 


^The  expectation  is  taken  over  all  individuals,  whether  or  not  they  are  fully  specified  with  respect  to  subfunction  i.  This  is 
not  necessarily  the  same  as  either  the  expectation  over  all  individuals  which  are  fully  specified  with  respect  to  subfunction  i  or 
the  expectation  over  all  length  A  individuals  which  are  fully  specified  with  respect  to  subfunction  i.  The  latter  are  equivalent, 
and  also  equivalent  to 

f[.^i(A.L)]=  ^  <^,(a,l)-Pr[A  =  aAL  =  l]  , 

[ai,l)eA^i  xir(Ci) 

which  might  be  referred  to  as  the  subfunction  mean. 
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Similarly,  the  variance  of  <l>i(X.c)  is  by  definition^ 


Var[</.i(X.c)]  = 


.^.(x.c)-5  [0i(X,c)]  \  •Pr[X  =  x]  , 

x€5  ^  ^ 


and  for  X  ^  U{S). 


Var  [0i(X.c)] 

=  ^[(?ii(x.c)-/ii{5,c)]2-Pr[X  =  x] 

x€5 

=  i.  I  [(^i(x.c) -/ii(5,c)]2 

*'  xesn/,(A) 

+  Y  [«!'i(x.c) 

x€5n/.^j^  .(A.c) 

+  Y  [<^i(x.c) -/Xi(5,c)f  I 

xe5n/_,^.^  JA,C)  ^ 

=  I'  I  [^--,^(5,c)f 

+  £[{<Pi{x.c)-^H{S.c)y\xeSnI^i^,{x.c)] 
+  ['^i(c.c) -/fi(5.c)]2 


card  (5  nli{X)) 


card  (5  0 

card(<S  0 c)) 


Upon  simplification  using  Lemma  5.3.2.  the  result  follows  immediately.  ■ 

The  constants  depend  on  the  fitness  function,  as  do  the  ^“(5,c)'s,  cr?“(5,c)'s,  and  the  ^i(c,c)'s,  which 
each  also  depends  on  the  competitive  template  c.  For  the  cases  <S  =  I{X),  S  =  Ip{X),  and  S  =  7_,^(A). 
analytical  expressions  for  the  cardinalities  of5n/t(A),  Snl^i-^^(X),  and  5n/-.i-,;^.(A)  are  given  in  Appendix  A. 


^Analogous  to  the  preceding  remark,  the  expectation  is  taken  over  the  set  of  all  individuals  of  length  A.  Again,  for  X  ^ 
this  is  not  necessarily  the  same  as  the  expectation  over  all  individuals  which  are  fully  specified  with  respect  to  subfunction  i, 
whether  or  not  restricted  to  those  individuals  of  length  A.  The  latter  expectations  are  both  equivalent  to 

Var  [.^.{A.Dl  =  Y  [<)ii(A.L)]}^  •Pr[A  =  aAL  =  l]  . 

Ca,l)€-4*=»X7r(£i) 

which  might  be  referred  to  as  the  subfunction  variance. 
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The  next  theorem  considers  the  specific  fitness  distributions  associated  with  random  individuals  drawn 
uniformly  from  the  classes  I/}  and  1-,^).  The  following  corollaries  present  decompositions  of  the  first  two 
central  moments  of  each  of  these  fitness  distributions.  For  the  case  of  normally  distributed  fitnesses,  the  first 
two  moments  determine  the  distribution.  The  first  corollary  relates  to  the  fitness  distribution  for  the  class 

Ip- 

Corollary  5.3.4  (Fitness  expectations  of  individuals  containing  building  block)  Let  $  =  (f>i  : 
I  xlp  — >  R  6e  a  separable  fitness  function,  c  €  Ip,  and  X  ~  U{Ip(X)).  Then  the  expected  value  of  $c(X) 
is 


f  [$c(X)]  =  +  ^/til^(A.c)  , 


where 


/‘i|/3(A,c)  = 


Np(\) 


and  the  (/^(A).c)  s  are  defined  by  Equation  1^.  Furthermore,  the  variance  o/$c(X)  is 


Var  [$e(X)] 


{  [0.-  -/‘i|^(A.c)]2 

+  (^^i~iIpW-<^)  +  [l>-i  (IpW-,c)  - /«i|^(A.c)]^^ 

+  ['^i(c.c) -/ttii^(A.c)]^ 

+  2  EE  Cov  [<l>i(X,c).,f>j(X,c))]  , 


(18) 


where  the  (T?  (7y3(A),c)  's  are  defined  by  Equation  16. 
Proof:  By  the  linearity  of  the  expected  value  operator, 


5[$c(X)]  =  ^5[<6<(X,c)]  .  (19) 

>=i 
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Because  X  €  /^(A),  the  term  corresponding  to  i  =  /?  is  just  €  [(^/5(X,c)|  =  while  by 
terms  corresponding  to  i  ^  (3  may  be  written 


Theorem  5.3.3,  the 


SIMX.C)]  = 


card  (/^ (A))  { 


<l>1  ■  card(7^(A)  n7j(A)) 

+  Mr(^;0(^)ic)  •  card(7^(A)  n7-,i^.(A,c)) 

+  ^t(c,c)  •  card(7^(A)n7-,i^;;^,(A,c))  | 


which  completes  the  proof  of  the  claimed  expected  value.  Similarly,  the  conditional 


variance  may  be  expressed 


Var[$c(X)]  =  X^Var  [«ii(X,c)]  +  2^]^Cov  [«li(X.c),^j{X,c))] 

1=1 

=  X)Var[^i(X,c)]  +  2  XlIZ  Cov  [,^i(X,c).,^^(X,c))]  , 


where  we  have  used  Lemma  5.3.1.  Theorem  5.3.3  implies  that  the 


remaining  variance  terms  are 


Var  [^i(X.c)]  = 


card(7^(A)) 


{4>i  -  ■  card{7^(A)n7i(A)) 


+  (-f^(A),c)  +  [/i.  (7;3(A),c)-/xi(7^(A),c)]2j  .  card(7y3(A)  n  7^i;^,(A,c)) 

+  -/x;(7;3(A),c)]2  •  card(7^(A)n7..i^x,(A))  |  . 

It  remains  only  to  note  that  /ti(7^(A),c)  =  //i|^(A.c). 

The  next  corollary  relates  to  the  fitness  distribution  for  the  class  7-,^. 

Corollary  5.3.5  (Fitness  expectations  of  individuals  lacking  building  block)  let  $  =  ^  .0,  :  7 

If— be  a  separable  fitness  function,  c  €  If-  and  X  ~  17(7..^(A)).  Then  the  expected  value  o/$,(X)  i 


f[#e(x)]  =  ;en^/3(a,c)  , 
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where 


Mil 


and  the  fi-  (/-,^(A),c) ’s  are  defined  by  Equation  I4.  Furthermore,  the  variance  o/#c(X)  is 


Var  [$c(X)]  = 

1  ”*  f 

N^fiiX)  '  ^  I  -  /^ih/3(A,c)]2 

+  [<^i(c,c)-/ii|^^(A,c)p 

+  2EE<^°M^.-(X,c).^,-(X,c))]  , 

i<j 

where  the  ct.^"(/^^(A), c)  s  are  defined  by  Equation  16. 

Proof:  Equation  19  holds,  and  by  Theorem  5.3.3,  each  term  may  be  written 


N-0AX) 

^-'0,-^i-’Xi{X,c)  ^ 


(20) 


f[0i(X,c)]  = 


card(7^^(A)) 


+  Mi  {I-^p{X),c) 
+  <Pi(c,c) 


t^i\^0{X,c)  , 


•  card(/,,^(A)n/i(A)) 

•  card(/^^(A)n/.^i;^,(A,c)) 

•  card(7-,^(A)n7-,i^;^.(A,c))  | 
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which  completes  the  proof  of  the  claimed  expected  value.®  Similarly,  the  conditional  variance  may  be 
expressed 


Var[$,(X)]  =  Y,^^.v[<t>i{^.c)]  +  2'£'£^ov  [<f>i(X.c).<t>j(X,c))]  . 


card  R  /^(A)) 

card  (/-./9(A)  n  /-,i^.(A.c)) 

cavd(I-.p{X)r\I^i-,^,(\})  I  . 

■ 

This  section  considers  the  conditional 
fitness  distribution  of  individuals  Xi  ~  ?/(/^(Ai)).  given  that  Xi  shares  Ac  defining  loci  with  an  individual 
X2  ~  U(I~,^(X2))-  It  also  considers  the  corresponding  conditional  fitness  distribution  for  X2.  The  following 
lemma  is  important  in  the  analysis  of  both  distributions. 

Lemma  5.3.6  (Number  of  individuals  sharing  Ac  defining  loci  —  Part  I) 

Let  $  =  :  I  X  Ip  — R  fte  a  separable  fitness  function,  Cp  the  set  of  defining  loci  of  suhfunction  13, 

and  k  =  card{Cj3),  Suppose  that  Xi  G  I^(Xi),  where  I  is  an  IfGA  individual  space  with  finite  genic  alphabet 
A  and  nominal  string  length  I,  Then  the  number  of  individuals  X2  €  /-,;3(A2)  sharing  Ac  defining  loci  with 

®  As  reflected  in  the  expressions  given  in  Appendix  A,  the  case  for  which  i  =  /3  has 

/-.5(A)n7dA)  =  {}=>  =  0  . 

i^siX)  n  /-,i^,(A)  =  ==  n^3^^{X)  . 

and 


1=1 


i<j 


By  Theorem  5.3.3  each  variance  term  may  be  written 


Var  [^i(X,c)]  = 


1 


card(/.^(A)) 

[r,  -  tii{i-.0{x)rc)]^ 

+  +  [fl.{U;,{X).c)  -  ^tiiI^p{X),c)]^ 

+  [</)i(c.c)  -/ti(/-,;3(A),c)]2 


Since  /ii(/-,^(A),  c)  =  /i^j_,^(A.  c).  the  proof  is  complete. 


5.3.3  Central  Moments  of  Conditional  Fitness  Distributions. 
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Xi  ts 


=  [card{At^(^A(^ 

/  \^2  ““  ^ 


)(Ar-X)-i“-'^'i"-‘G;::)(r-t)  ■ 


Also,  the  number  of  pairs  (xi,X2)  €  Ip{Xi)  x  I^^(X2)  sharing  Xc  defining  loci  is 


card MA1.A2,  Ac))  =  iV^(Ai)  •  iV<^)(Ai.  Aj,  Ac) 


(22) 


Proof:  The  sets  I/3(X2)  and  I-,^{X2)  form  a  partition  of  /(A2).  Thus, 

card({x2  €  I^0(X2)  ■  Ac(xi,X2)  =  A^}) 

=  card({x2  €  /(A2) :  Ac(xi.X2)  =  Ac})  -card({x2  €  //3(A2)  :  Ac(xi.X2)  =  Ac})  . 

For  arbitrary  individuals  X2  6  /(A2).  each  of  the  A2  alleles  has  card  (^4)  possible  values.  For  such  individuals 
having  Ac  defining  loci  in  common  with  an  individual  Xi  €  the  Ac  common  loci  must  be  chosen  from 

the  Ai  loci  of  Xi.  while  the  remaining  A2  -  Ac  loci  of  X2  must  be  chosen  from  the  f  -  Ai  loci  for  which  Xi 
does  not  contain  genes.  Thus, 

card{{x2  e  /(A2)  :  Ac(xi,X2)  =  Ac})  =  [card {At^-  Q') 

For  arbitrary  individuals  X2  €  each  of  the  k  alleles  corresponding  to  the  loci  of  subfunction  is  fixed. 

The  remaining  A2  -  A:  alleles  have  card  (.4)  possible  values  each.  For  such  individuals  having  Ac  defining 
loci  in  common  with  an  individual  Xi  €  k  of  the  Ac  common  loci  are  those  of  subfunction  l3.  The 

remaining  Xc  -  k  must  be  chosen  from  the  other  Ai  -  A:  loci  of  Xi.  The  remaining  A2  -  Ac  loci  of  X2  must 
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be  chosen  from  the  I  —  Xi  loci  for  which  Xx  does  not  contain  genes.  Thus. 


card({x2  €/^(A2)  :  Ae(xi,X2)  =  Ac}) 


[card  (>!)]•'- 


which  completes  the  proof  of  the  claimed  expression  for  iVi^^(Ai,  Aj,  A^).  Since  Xi  is  arbitrary,  every  Xi  € 
//3(Ai)  has  the  same  number  of  individuals  X2  €  /-.^(Aj)  such  that  (xi.xa)  €  a;(Ai.  Aj,  A^).  which  proves  the 
claim  regarding  the  cardinality  of  w(Ai.  A2,  AJ.  g 

The  significance  of  Equation  21  is  not  so  much  the  specific  expression  for  iVc^*(Ai.  A2.  Ac),  but  rather  the 
fact  that  It  IS  independent  of  Xi.  That  is,  for  a  given  Ac  every  individual  Xx  G  Ip(\i)  has  the  same  number 
of  individuals  Xa  €  I^piX2)  with  which  it  shares  Ac  defining  loci.  The  following  theorem  is  a  consequence. 

Theorem  5.3.7  (Conditional  distribution  of  individuals  containing  building  block) 

Let  ^  =:  '■  I  X  If  *  R  be  a  separable  fitness  function,  and  c  G  Ip.  Suppose  Xx  ~  U{Ip(\i))  and 

Xa  ~  U(I^p(\2))  are  independent.  Then  the  conditional  distribution  of  Xi  given  that  AclXx.Xa)  =  Ac  is 
also  U{Ip(Xi)).  That  is,  the  probability  that  Xx  =  Xi  given  that  Ac(Xx,X2)  =  Ac  is 


/l(Xx|Ac  =  Ac)  =  [Ar;3(Ai)]-i  . 


Proof:  By  the  Law  of  Total  Probability,  the  conditional  probability  that  Xi  =  Xi  given  that  Ac(Xx,  X2)  = 
Ac  is 


/x(xi  I  Ac  =  Ac) 


Pr[Xx=Xi  I  Ac(Xi.X2)  =  Ac] 


“  Pr[Xi  =  Xx  A  X2  =  X2  I  Ac  =  Ac]  . 

Xj€/,j(A2) 
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For  Xi  ~  U{If)(\i))  and  X2  ~  U(I^f}{\2))  independent,  this  may  be  written 
/i(xi  I  Ac  =  Ac) 

_  P£pM2^X2)  =  Ac  I  Xi  =  xi  A  X2  =  X2]  •  Pr[Xi  =  xi]  ■PrfX2  =  X2I 

Pr[Ac(Xr,X2)  =  Ac] 

_  •[^-^^(A2)]  ^  '  Sx3gJ,j(A;)  P*^[^c(Xi,X2)  =  Ac  I  Xi  =  Xi  a  X2  =  X2] 

card(w(Ai,A2.Ac))’  [Ar^(Ai)]-i  •  [Ar..^(A2)]-i  ‘ 

Because  Pr[Ac(Xi,X2)  =  Ac  |  Xi  =  Xi  A  X2  =  X2]  =  1  if  Ac(xi,X2)  =  Ac,  and  0  otherwise,  this  is  just 

/i(xi|Ac  =  Ac)  =  card(u;(Ai,A2.Ac))~^-iVW(Ai,A2,Ac) 

=  . 

where  we  have  used  Lemma  5.3.6.  ^  ^ 

Of  course,  since  the  conditional  distribution  of  Xi  is  identical  to  its  unconditional  distribution,  the  con¬ 
ditional  expectations  are  identical  to  the  unconditional  expectations.  In  particular,  the  following  corollary 
gives  the  conditional  expectation  and  variance  of  the  fitness  distribution  for  the  class  J^(A). 

Corollary  5.3.8  (Conditional  fitness  expectations  of  individuals  containing  building  block) 

Let  ^  '■  I  'x  If  — »  K  be  a  separable  fitness  function,  and  c  e  Ip.  Suppose  Xi  ~  U{Ip{Xi))  and 

X2  ~  U(Ij3{X2)).  Then  the  conditional  expectation  o/$c(Xi)  given  that  Ac(Xi,X2)  =  Ac  is 

^  [^c(Xi)  I  Ac(Xi.X2)  =  Ac]  =  /ti|^(Ai.c)  , 

iltp 

where 

/bl^(Ai.c)  =  +  , 
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and  the.  (/^(Ai),c)  s  are  defined  by  Equation  I4.  Furthermore,  the  eonditional  variance  o/$c(Xi)  given 
that  Ac(Xi,X2)  =  Ac  is 


Var  [$,(Xi)  I  Ac(Xi,X2)  =  Ac 


JattE 


N0{\^)  ^ 


[9^,' -ft|/3(Al,c)]2  .  iV/3,i(Ai) 


+  +  [/‘i  (Ai.c)  - /ij|;3(Ai,c)f  ^  •  A>,^i;^,(Ai.c) 


+  ['/'i(c,c) -/ii|^(Ai,c)]2  .  iV/3,-,.^^_(Ai.c)  I 

+  2  EE  Cov  [</'i(Xi.c).^_,(Xi.c)]  .  (23) 

where  the  cr?”( J^(Ai ),  c)  '5  are  defined  by  Equation  16, 

Proof:  The  result  follows  immediately  from  Theorem  5.3.7  and  Corollary  5.3.4.  ■ 

The  remainder  of  this  section  considers  the  conditional  fitness  distribution  of  individuals  X2  ~  U(I^^{X2)). 
given  that  X2  shares  Ac  defining  loci  with  an  individual  Xi  ~  {/(/^(Ai)).  In  contrast  to  the  situation  for 
the  class  Ip.  here  the  conditional  distribution  of  fitnesses  is  not  in  general  identical  to  the  unconditional 
distribution.  This  is  because  for  an  individual  X2  e  I^p{X2)  and  a  given  number  Ac  of  common  defining  loci, 
the  number  of  individuals  Xi  €  /^(Aj)  such  that  Ac(xi.X2)  =  Ac  depends  on  the  choice  of  defining  loci  for 
X2.  This  is  made  precise  by  the  following  lemma. 

Lemma  5.3.9  (Number  of  individuals  sharing  Ac  defining  loci  —  Part  II)  Let  ^  ■  I  'x 

^  be  a  separable  fitness  function,  Cp  the  set  of  defining  loci  of  subfunction  fi.  and  k  =  card{Cp). 
Suppose  that  X2  e  I-,p(X2),  where  I  is  an  IfGA  individual  space  with  finite  genic  alphabet  A  and  nominal 
string  length  1.  Then  the  number  of  individuals  Xi  €  /;3(Ai)  sharing  Ac  defining  loci  with  X2  is 


iVr^’(Ai,A2.Ac)  =  [card(.4)]^>-^f^2“''(''2)V^-A2-(A:-r(x2)) 

VAc  -  r{x2)J  VAi  -  Xc-  {k  -  r(x2); 


where  r(x2)  is  the  number  of  loci  of  subfunction  0  with  respect  to  which  X2  is  defined. 
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Proof:  For  arbitrary  individuals  Xi  €  each  of  the  k  alleles  corresponding  to  the  loci  of  subfunction 

P  is  fixed.  The  remaining  Xi  -  k  alleles  have  card  (^)  possible  values  each.  For  such  individuals  having 
Ac  defining  loci  in  common  with  an  individual  X2  €  I-.0(X2),  r(x2)  of  the  Ac  common  loci  are  those  of 
subfunction  p.  The  remaining  Ac  -  r(x2)  must  be  chosen  from  the  other  A2  -  'r(x2)  loci  of  X2.  Of  the 
Ai  -  Ac  defining  loci  of  xi  which  are  not  shared  by  X2,  k  -  r(x2)  are  those  of  subfunction  p.  The  remaining 
Xi-  Xc-{k-  r(x2))  must  be  chosen  from  the  i  -  X^  -  {k  -  r(x2)  non-subfunction  P  loci  for  which  X2  does 
not  contain  genes,  which  completes  the  proof.  g 

The  conditional  moments  of  the  fitness  distribution  for  individuals  X2  U{I^f^(X2)),  given  that  X2 
shares  Ac  defining  loci  with  an  individual  Xi  ^  U(If^{Xi)).  Because  the  number  of  individuals  Xi  G  Ip(Xi) 
such  that  Ac(xi.X2)  =  Ac  (Equation  24)  depends  on  X2.  and  in  particular  on  its  defining  loci,  the  conditional 
distribution  is  not  in  general  uniform.  The  following  theorem  presents  the  conditional  probability  density 
function  for  the  class 

Theorem  5.3*10  (Conditional  distribution  of  individuals  lacking  building  block) 

01  :  /  X  /jr  — ^  R  a  separable  fitness  function,  and  c  G  Ip-  Suppose  Xi  ~  ?7(/^(Ai))  and 
X2  ~  U(I-,i3(X2))  are  independent.  Then  the  conditional  density  function  0/X2  given  that  Ac(Xi.X2)  =  Ac 
is 


/2(X2  I  Ac  =  Ac)  =  /2.C  . 


where 


X2  €i?{r.  A2)  =  {x€/^^(A2):  X  is  defined  w.r.t.  exactly  r  of  the  loci  of  subfunction  P}  ,  (25) 


/2,r  =  Ari"'^’(Ai,  A2,  Ac.r)  •  co7-d(a)(Ai.  A2.Ac))  ^  , 


(26) 
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the  Nc"^\Xi,X2.Xc.r)  's  are  defined  hj  Equation  24.  and  carrf (w( Ai,  A2,  A^))  is  given  by  Equation  22. 

Proof:  By  the  Law  of  Total  Probability,  the  conditional  probability  that  X2  =  X2  given  that  Ac(Xi.  X2)  = 
Ac  is 


/2(X2lAc  =  Ac)  =  Pr[X2  =X2  I  Ac(Xi,X2)  =  Ac] 

=  Pr[Xi  =  Xi  A  X2  =  X2  I  Ac  =  Ac]  . 

€/3(  Ai) 

For  Xi  ~  U{Ip{Xi))  and  X2  ~  ?7(/-,^(A2))  independent,  this  may  be  written 
/2(X2  I  Ac  =  Ac) 

^  y-  Pr[Ac(Xi.  X2)  =  Ac  I  Xi  =  xi  A  X2  =  xa]  •  Pr[Xi  =  xi]  ■  PrfXa  =  Xa] 

Pr[Ac(X:,X2)  =  Ac] 

[jV^(Ai)]-^  •  [iV.^(A2)]-i  ■  Ex,6/a(A.)  Pr[Ac(Xi.X2)  =  Ac  |  X^  ^  xi  A  Xa  ^  Xa] 
card(w(Ai.A2.Ac))-  [Ar^(Ai)]-i  •  [iV^^(A2)]-i  • 

Because  Pr[Ac(Xi.X2)  =  Ac  |  Xi  =  Xi  A  Xa  =  Xa]  =  1  if  Ac(xi,X2)  =  A^  and  0  otherwise,  this  is  just 
/2(x2|Ac  =  Ac)  =  cardMAj,A2.Ac))"^-iVf-^)(Ai,A2.Ac.r(x2))  . 


where  we  have  used  Lemma  5.3.9.  ^ 

Because  the  conditional  distribution  of  Xa  is  not  uniform,  the  decomposition  of  the  subfunction  contributions 

provided  by  Theorem  5.3.3  does  not  apply  directly.  The  following  theorem  presents  a  decomposition  which 
does  apply. 

Theorem  5.3.11  (Conditional  expectations  of  subfunction  contributions)  Let  #  =  :  f  x 

If  R  be  a  separable  fitness  function,  and  c  e  Ip.  Suppose  Xi  ~  Cf(/^{Ai))  and  Xa  ~  U(Up{X2)) 
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are  independent.  Then  the  conditional  expectation  o/0,(X2,c)  given  that  Ac(Xi,X2)  =  Ac  is 

k 

Mih#3A,(^2.c)  =  '^card(R{T,X2))  ■  f2.r  ■  lii(Rir,X2),c)  ,  (27) 

r=0 

where  the  R(r^  A2)  'S  are  defined  by  Equation  25,  the  /2,r 's  a, re  defined  by  Equation  26,  and  the  f.Li{R{r,  A2),  c)  5 
are  defined  by  Equation  13.  Also,  the  conditional  variance  o/0i(X2,c)  is 


^ih^A,('^2,c)  =  '^card{R(r.X2))-  f2,r-  <r?(i?(r,  A2),c)  +  Ajl.c)  - /t<|-,^A,(A2.c)}2  .(28) 

r=:0  ^ 

where  the  cr?(i?(r,  A2),c) ’5  are  defined  by  Equation  15. 

Proof;  By  definition,  the  conditional  expectation  of  the  fitness  contribution  of  subfunction  i  is^ 

f  [.^i(X2.c)  I  A,(Xi.X2)  =  Ac]  =  Y.  <^i(X2.c)-Pr[X2=X2  I  A,(Xi.X2)  =  A,]  . 

X2G/-,3(Ao) 

Because  {ii(0.  A2) —  A2)}  is  a  partition  of  /-,^(A2).  this  may  be  written  as 

k 

f  [<?i<(X2,C)|  A,(Xi.X2)  =  A,]  =  Y  Yl  ^i(X2,c)-Pr[X2=X2  I  Ae(Xi,X2)  =  Ae]  . 

r=0  x2e-R(r,A2) 

By  Theorem  5.3.10.  the  conditional  probability  density  of  X2  is  constant  over  each  R{r),  so  that 

k 

5  [<^i(X2,c)  I  Ae(Xi.X2)  =  Ae]  =  Y  Y  ^<(^2.  c)  • /2., 

r=0x3€i?(r,A2) 
k 

=  Y  Y 

k 

=  ^/2,r  •card(i?(r.  A2))  •/ii(i?(7-.A2),c) 

r=0 

'  See  footnote  4. 
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which  completes  the  proof  of  the  claimed  conditional  expectation.  Similarly,  the  conditional  variance  of 
0i(X2.c)  is  by  definition® 

Var  [^i(X2.c)  I  Ae(Xi,X2)  =  Ae] 

=  [^i(X2.c)]  1  ■Pr[X2  =  X2  I  Ac(Xi,X2)  =  A^] 

x€I^j(A3)  J 

k 

=  [/2.rCard(i?(r,A2)).f  [{(?ii(X2,c)  -/ii|^^;,JA2,c)}2  I  X2  €i?(r,A2)]]  . 

r=0 

where  we  have  again  used  the  facts  that  the  i?(r,  A2)'s  form  a  partition  of  I^piXz)  and  that  the  conditional 

density  of  X2  is  constant  over  each  i?(r.  A2).  Upon  simplification  using  Lemma  5.3.2.  the  result  follows 
immediately.  ^ 

The  following  corollary  presents  the  central  moments  of  the  conditional  fitness  distribution  of  the  class  1^0. 

Corollary  5.3.12  (Conditional  fitness  expectations  of  individuals  lacking  building  block) 

Let  ^  ■.  I  X  Ip  — ►  R  6e  a  separable  fitness  function,  and  c  e  Ip.  Suppose  Xi  ~  17(/^(Ai))  and 

X2  ~  U(/^(A2)).  Then  the  conditional  expectation  o/$c(X2)  given  that  Ac(Xi,X2)  =  is 

m 

[^c(X2)  I  Ac(Xi,X2)  =  Ac]  =  ^/li|-,/3,A,{A2,c)  , 

1=1 

where  the  (A2,  c)  s  are  defined  by  Equation  27.  Furthermore,  the  conditional  variance  0/ $4X2) 

given  that  Ac(Xi,X2)  =  Ac  is 

Var  [$c(X2)  I  Ac(Xi,X2)  =  Ac] 

m 

=  I^^^h/JA.('^2,c)  +  2^^Uo(.  [0i(X2.c).,^j(X2,c)  I  Ac(Xi.X2)  =  Ac]  . 

2=1  icj 

where  the  's  are  defined  by  Equation  28. 


®See  footnote  5 


Proof:  The  results  follow  immediately  from  the  linearity  of  the  expected  value  operator  and  Theorem  4, 
§4.9,  of  Hogg  and  Craig  [41],  respectively.  H 


54  Application  of  Tournament  Selection  Model 

Under  certain  conditions,  the  model  developed  in  Sections  5.1  and  5.2  enables  exact  determination  of  the 
expected  state  following  binary  tournament  selection.  Specifically,  for  certain  classes  of  binary  tournament 
selection  algorithms  (identified  in  Section  5.2),  the  model  exactly  predicts  the  probability  that  an  individual 
in  population  P(t)  belongs  to  one  of  two  classes.  This  section  demonstrates  the  validity  of  the  model.  The 
application  chosen  for  the  demonstration  is  the  prediction  of  the  fraction  of  individuals  which  contain  a 
particular  building  block  in  each  generation  of  one  selection  episode  of  a  fast  messy  genetic  algorithm.  The 
experimental  design  is  discussed  in  Section  5.4.1,  and  the  results  are  presented  in  Section  5.4.2. 


5.4.1  Experimental  Design.  The  fraction  of  individuals  containing  a  particular  building  block 
in  each  generation  is  compared  to  the  fraction  predicted  by  the  proposed  model.  The  predicted  mean  and 
variance  of  the  fitnesses  of  individuals  containing  the  building  block  are  also  compared  to  the  observed  values, 
and  similarly  for  individuals  lacking  the  building  block.  The  fast  messy  genetic  algorithm  is  executed®  ten 
times  using  different  random  seeds  to  facilitate  statistically  significant  conclusions.  Standard  fast  messy 
genetic  algorithm  parameters  are  used,  including  no  thresholding  in  the  first  episode.  Other  relevant^® 

parameters  are  presented  in  the  remainder  of  this  section,  which  discusses  the  fitness  function  and  modeling 
assumptions  used. 


The  fitness  function  for  these  experiments  is  that  used  by  Goldberg  et  al.  to  demonstrate  the  feasibility 
of  the  fast  messy  genetic  algorithm  [35].  This  function,  a  “tightly-coded  50-bit  order-5  fully  deceptive  trap 


tir.*T9S^  experiments  are  performed  on  one  node  of  an  Intel  Paragon  using  AFIT's  fast  messy  genetic  algorithm  implementa- 
lion  [JSj.  modiiied  to  collect  statistics.  ^  o 

i®Many  of  the  fast  messy  genetic  algorithm  parameters  have  no  impact  on  the  experiments  performed  here.  In  particular, 
because  no  thresholding  is  used,  the  shuffle  size  is  of  no  consequence.  Also  irrelevant  are  the  cut  and  splice  probabilities  the 

durations  of  the  primordial  and  jnxtapositional  phases,  and  the  filtering  and  thresholding  parameters  for  other  than  the  first 
episode. 
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function,"  a,s  well  as  the  underlying  IfGA  representation,  is  defined  in  the  following  in  the  notation  of 
Section  2.6.1  and  Definition  4.1.2. 

The  individual  space  is  defined  over  genic  alphabet  A  =  {0, 1},  with  nominal  string  length  ^  =  50  and 
overflow  factor  o  —  1.6,  so  that 


80 

I  =  [J({0,l}"x{l....,50}")  . 

A=0 

and  the  set  of  fully  specified  individuals  is 

Ijp  =  /(50)  =  {((ai, . . .  .aso),  (/i, - /so ))  €  /  :  /^  =  ^  =  j}  • 

The  overlay  mapping  is  F  :  I  x  Ijr  — ^  A^  as  defined  by  Equation  3.  For  i  G  10},  take  jCi  = 

{5z  -  4, ... ,  5'/}.  corresponding  to  a  “tight"  coding,  so  that  the  projection  mappings  Vc^  :  {0. 1}^^  — >  {0, 1}^ 
are 


^  ttso)  =  (a5j.>4, . . .  .asi)  . 


For  i  G  {1 . 10}.  the  (identical)  decoding  subfunctions  are  Di  =  D  where  D  :  {0,1}^  _ ►  R  is  the 

“counting  ones'*  function 


D{ai...,,as)  =  card({i  €  {l....,5}  :  Ui  =  1})  . 


Also  for  i  E  {1 . 10},  the  (identical)  objective  subfunctions  are  fi  =  /  where  /  :  R  — >  R  is  the  “trap" 

function 


f{^) 


0.58(4  “  x)  ,  if  X  <  4 
1.00  ,  if  X  >  4 
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Finally,  the  fitness  function  ^  :  I  x  Ip  — >  R  is 

10 

»=:1 

where  each  ^i  =  fioDiO  Vc,  o  T  is  a  fitness  subfunction.  The  competitive  template  for  these  experiments  is 

. 0)^  (1,  •  •  • ,  50)),  as  used  by  Goldberg  et  al.  [35],  The  initial  population  of  the  fast  messy  genetic 

algorithm  is  drawn  from  a  uniform  distribution  over  I[i  —  k)^  thus  =  = 

It  IS  clear  that  $c  is  an  order-5  separable  IfG A,  fitness  function.  Thus,  the  decompositions  of  the  central 
moments  presented  m  Section  5.3  are  applicable.  The  assumption  of  zero  covariances  throughout  is  somewhat 

justified,  because  the  condition  A  »  A;  results  in  near-independence  of  the  subfunction  contributions.  It  is 
then  straightforward  to  obtain 


and 


<t>l  =  1 


4>i{c.c) 

=  0.58  . 

=  0.2471  ,  for  i  ^  /I  . 

=  0.0187  ,  for  i  ^  [3  , 

=  0.2441  .  for  i  €  {1. . , 

•,10} 

(7l~{Up(X).c) 

=  0.0188  .forie{l.. 

■•,10} 

The  values  of /t ■  (/^(A),c)  and  /i-  are  unmistakably  similar,  as  are  those  of  A). c) 

and  (rf-(/^^(A).  c).  This  similarity  is  due  to  the  previously  mentioned  near-independence  of  the  subfunction 
contributions. 


128 


The  mean  and  variance  of  the  fitness  distribution  for  individuals  containing  a  particular  building  block 
are  thus  3.493  and  0.290,  respectively.  Similarly,  the  unconditional  mean  and  variance  for  individuals  lacking 
the  building  block  are  2.730  and  0.5662.  which  are  essentially  identical  to  the  corresponding  conditional  values 
(for  all  Ac  such  that  the  conditional  distribution  exists). 

In  order  to  gain  analytical  tractability,  the  decision  making  model  proposed  in  Section  5.1  neglects 
the  possibility  of  ties  by  assuming  that  the  fitnesses  of  individuals  are  random  variables  of  the  continuous 
type.  For  the  fitness  function  employed  in  these  experiments,  this  assumption  does  not  hold.  Thus,  for 
purposes  of  the  decision  making  model,  the  computational  experiments  reported  here  approximate  the  fitness 
distributions  by  normal  distributions  with  the  means  and  variances  just  calculated. 

The  expressions  given  in  Section  5.1  for  the  probabilities  of  correct  decision  making  also  assume  that 
the  fitnesses  of  the  ancestors  of  the  competing  individuals  are  mutually  independent.  This  assumption  holds 
provided  that  the  ancestors  are  distinct.  In  the  absence  of  thresholding,  each  individual  possesses  a  maximum 
of  2^  ancestors.  Thus,  if  all  of  xi's  ancestors  are  distinct,  and  likewise  those  of  X2.  then  the  probability  in  a 
finite  population  of  size  N  that  a  specific  ancestor  of  Xi  is  also  an  ancestor  of  X2  is  2^N~^.  Neglecting  the 
statistical  dependence  of  a  particular  individuaPs  ancestors,  the  expected  number  of  nomdistinct  ancestors  is 
therefore  (2*N’“^)2*  =  That  is.  less  than  one  common  ancestor  is  expected  provided  that  t  <  log4  N, 

which  is  one  half  of  the  “takeover  time.“  These  experiments  use  a  population  size  of  iV  =  1786.  On  the 
basis  of  the  preceding  argument,  it  is  reasonable  to  expect  the  probabilities  of  correct  decision  making  to  be 
accurate  through  generation  5  <  log4  1786  «  5.4. 

Finally,  after  the  first  iteration  of  selection,  closed  form  solutions  for  the  probabilities  of  correct  decision 
making  do  not  exist.  Consequently,  these  values  are  obtained  numerically  (see  Appendix  B). 

5.^.2  Experimental  Results,  The  predicted  and  observed  fraction  of  individuals  containing  the 
building  block  in  each  generation  is  shown  in  Figure  24.  The  predicted  state  is  accurate  through  the  third 
generation,  after  which  it  becomes  overly  ‘"optimistic.  That  is,  it  predicts  a  greater  fraction  of  individuals 
containing  the  building  block  than  is  observed.  The  source  of  this  over-optimism  may  be  explained  by 
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Figure  24.  Predicted  vs.  Observed  fmGA  State 

comparing  the  predicted  and  observed  fitness  distribution  moments.  The  predicted  and  observed  mean 
fitnesses  for  individuals  containing  the  building  block  are  shown  in  Figure  25.  Again,  the  prediction  is 
initially  accurate,  but  after  the  fourth  generation,  it  underestimates  the  actual  mean.  The  inaccuracy  in 
this  prediction  begins  in  a  later  generation  than  the  inaccuracy  in  the  state  prediction.  Thus,  the  former 
is  an  effect  of.  rather  than  a  cause  of.  the  latter  inaccuracy.  The  mean  fitness  of  individuals  lacking  the 
building  block  is  shown  in  Figure  26.  The  results  are  qualitatively  similar  to  those  for  individuals  containing 
the  building  block.  The  predicted  fitness  standard  deviations  are  compared  in  Figure  27  to  their  observed 
counterparts  for  individuals  containing  the  building  block.  Although  not  as  accurate  as  the  prediction  of  the 
distribution  means,  the  prediction  for  this  statistic  is  well  within  the  range  of  observed  values.  One  notable 
difference  between  this  result  and  those  for  the  distribution  means  is  the  inaccuracy  in  the  prediction  for 
the  initial  population.  This  is  attributable  to  the  fact  that  the  prediction  neglects  the  covariances  of  the 
subfunction  contributions,  which  although  small,  can  easily  be  seen  to  be  negative.  The  same  remark  applies 
to  the  predicted  standard  deviation  of  the  fitnesses  of  those  individuals  lacking  the  building  block,  which  is 
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Figure  25.  Predicted  vs.  Observed  Mean  Fitness  with  Building  Block 

shown  in  Figure  28  together  with  the  observed  values.  This  is  by  far  the  least  accurate  of  the  predictions 
considered  here,  and  the  inaccuracy  begins  in  an  earlier  generation  than  that  of  the  state  or  the  distribution 
mean  predictions.  It  is  reasonable  to  conclude  that  it  is  the  source  of  much  of  the  inaccuracy  in  the  other 
predictions. 

5.5  Summary 

This  chapter  develops  a  dynamical  systems  model  of  binary  tournament  selection  with  probabilistic 
thresholding.  The  key  components  of  the  model  are  an  order-statistics  based  decision  making  model  and 
a  hierarchical  Markov  chain  model  (see  Figure  29).  Together  with  the  probabilistic  building  block  filtering 
model  developed  in  Chapter  IV,  the  model  allows  prediction  of  expected  effectiveness  resulting  from  a  choice 
of  exogenous  parameters.  The  prediction  of  expected  effectiveness  serves  as  the  basis  for  the  parameter 
selection  techniques  proposed  in  Chapter  VI. 
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Figure  29.  Flow  of  Information  in  Dynamical  Systems  Model  of  Binary  Tournament  Selection  with  Prob¬ 
abilistic  Thresholding 
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VI.  Selection  of  Exogenous  Parameters 

The  fast  messy  genetic  algorithm  (fmGA)  described  in  Section  2.6.4  is  employed  as  an  optimum  seeking 
technique  in  several  limited  studies  [27.  28.  35.  54].  These  studies  and  this  research  show  theoretically  and 
empirically  that  the  fmGA  exhibits  a  number  of  advantages  over  the  simple  genetic  algorithm  (Section  2.4). 
the  messy  genetic  algorithm  (Section  2.6.3),  and  other  optimum  seeking  techniques. 

Practical  use  of  the  fmGA  is  limited  by  the  lack  of  an  acceptable  methodology  for  selection  of  its 
numerous  exogenous  parameters,  upon  which  its  effectiveness  depends.  In  particular,  experience  [28]  shows 
that  the  effectiveness  of  the  algorithm  depends  strongly  on  the  filtering  and  thresholding  parameters.  Ex¬ 
isting  parametei  selection  techniques  (see  Section  2.6.5)  are  essentially  heuristic  and  do  not  reliably  yield 
satisfactoiy  effectivene.ss  in  practical  applications.  Furthermore,  they  do  not  predict  the  expected  effective¬ 
ness  of  the  algorithm  resulting  from  a  given  set  of  parameters,  nor  whether  improved  effectiveness  may  result 
from  ‘‘tweaking"  the  parameters. 

This  chapter  addresses  the  exogenous  parameter  selection  problem  for  both  the  fmGA  and  the  gen¬ 
eralized  fast  messy  genetic  algorithm  (gfmGA)  described  in  Chapter  III.  The  parameter  selection  problem 
is  formally  posed  as  an  optimization  problem  (Section  6.1).  for  which  the  cost  function  is  related  to  the 
expected  effectiveness  resulting  from  a  particular  choice  of  exogenous  parameter  settings.  The  definition  of 
the  cost  function  involves  the  mathematical  models  of  probabilistic  building  block  filtering  (BBF)  and  binary 
tournament  selection  (BTS)  with  probabilistic  thresholding  developed  in  Chapters  IV  and  V,  respectively. 

Because  the  fmGA  filtering  and  thresholding  parameters  are  discrete,  the  resulting  optimization  prob¬ 
lem  is  combinatoric.  A  hill-climbing-based  fmGA  parameter  selection  technique  is  proposed  in  Section  6.2. 
In  contrast,  the  gfmGA  parameters  are  real-valued.  Section  6.3  discusses  the  use  of  vector  space  optimiza¬ 
tion  techniques  to  obtain  a  set  of  necessary  optimality  conditions  (NOCs)  for  the  parameters  of  the  gfmGA. 
Parameter  selection  techniques  for  the  gfmGA  are  proposed  based  on  numerical  solution  of  the  NOCs  and 
computational  optimization  of  the  cost  functional. 


6.1  Formal  Statement  of  the  Parameter  Selection  Problem 


The  formal  statement  of  the  linkage-friendly  genetic  algorithm  (IfGA)  exogenous  parameter  selection 
problem  as  an  optimization  problem  is  based  on  the  models  of  probabilistic  BBF  and  BTS  with  probabilistic 
thresholding  presented  in  Chapters  IV  and  V,  respectively.  In  particular,  the  cost  functional  is  defined  as 
an  error  between  the  expected  final  state  and  the  ideal  final  state  u,  where  the  expected  state  is  defined  in 
terms  of  the  population  vector.  The  population  vector  is  of  the  form 

Pro  Pii  •••  Pi« 

A  P20  P21  •  •  •  P2« 

P  = 

PmO  Pml  •  •  •  Pm« 

Each  component  pij  is  of  the  form  of  Equation  12.  where  A  is  the  class  of  individuals  containing  building 
block  i.  The  py’s  are  viewed  as  conditional  probabilities  that  an  individual  randomly  drawn  from  the 
population  contains  building  block  i  given  that  it  is  of  length  j. 

The  matrix  u(x.f)  is  defined  such  that  the  i.jth.  component 


is  the  probability  that  an  individual  randomly  sampled  from  population  P{t)  contains  building  block  i  and 
is  of  length  j.  given  the  exogenous  parameter  set  x.  The  expected  state  in  generation  t  is  the  vector 


Ju(x,i)  =  u(x.t)l  . 


the  ith  component  of  which  is  the  probability  that  an  individual  randomly  sampled  from  population  P{t) 
contains  building  block  i.  given  the  exogenous  parameter  set  x. 
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The  exogenous  parameters  (excluding  the  V^(0;  tkys  and  0(0;  fc,  t's)  for  iteration  k  of  the  generalized 
fast  messy  genetic  algorithm  are  taken  from  the  vector  space  X  =  They  are  subject 

to  the  inequality  constraints  given  in  Equations  6  and  7,  which  are  represented  formally  by 


G(x)^ 


G(x) 

—  X 


<  Oz  . 


where  G  :  X  — ^  Z. 

The  exogenous  parameter  selection  problem  may  thus  be  formally  stated  as: 

Find  the  set  of  exogenous  parameters  x  G  X  which  minimize  the 
cost  functional  J  :  X  — ►  R, 

^(x)=i||JJx)-u|||  . 

subject  to  G(x)  <  Oz- 
6.2  fm-GA  Parameter  Selection 

This  section  proposes  an  exogenous  parameter  selection  technique  for  the  fast  messy  genetic  algorithm 
which  bears  some  resemblance  to  the  technique  proposed  by  Kargupta.  The  most  significant  advantage 
of  this  technique  over  Kargupta's  is  that  the  choice  of  thresholding  parameters  explicitly  considers  the 
expected  effectiveness  of  the  algorithm.  Also,  as  a  consequence  of  the  underlying  tournament  selection 
model  (Chapter  V),  this  choice  reflects  the  dynamic  nature  of  the  probability  of  correct  decision  making. 
Another  advantage  is  that  all  of  the  design  parameters  required  by  the  technique  (the  nominal  string  length 
i.  the  estimated  level  of  deception  k,  and  the  assumed  initial  fitness  distributions)  are  already  required  by 
the  fast  messy  genetic  algorithm. 

Luenberger  presents  numerous  mathematical  techniques  for  the  optimizatioon  of  functionals  defined 
on  vector  spaces  or  subsets  of  vector  spaces  satisfying  specific  conditions  [50].  The  remainder  of  this  section 
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considers  their  use  in  the  fiiiGA  parameter  selection  problem,  and  shows  that  they  are  not,  in  general, 
directly  applicable. 

Both  the  pre-Hilbert  space  form  and  the  classical  form  of  the  Projection  Theorem  rec^uire  that  the  set 
over  which  optimization  is  performed  be  a  vector  space.  The  same  is  true  of  the  techniques  presented  for 
solution  of  minimum  norm  problems.  When  ^  +  1  is  not  prime,  no  operations  0  and  0  exist  such  that  B 
together  with  0  and  0  form  a  vector  space.  Thus,  the  projection  theorems  and  the  minimum  norm  problem 
techniques  are  not,  in  general,  directly  applicable. 

The  Fenchel  Duality  Theorem  and  the  Lagrange  Multiplier  theorems  for  global  theory  require  that  the 
set  over  which  optimization  is  performed  be  a  convex  subset  of  a  vector  space.  Again,  unless  ^  +  1  is  prime. 
B  does  not  satisfy  this  condition,  so  these  theorems  are  also  not.  in  general,  directly  applicable. 

The  Lagrange  Multiplier  theorem  for  local  theory  requires  that  the  functional  to  be  optimized  be 
Frechet-differentiable,  which  implies  that  the  set  on  which  it  is  defined  is  a  vector  space.  Likewise,  the 
Generalized  Kuhn-Tucker  Theorem  requires  that  the  functional  to  be  optimized  be  defined  on  a  vector 
space.  Thus,  these  theorems  are  also  not.  in  general,  directly  applicable. 

The  proposed  technique  is  as  shown  in  Figure  30.  A  possible  disadvantage  of  this  technique  is  the 

1.  Take  =  i  ^  k.  Set  e  =  0. 

2.  For  each  candidate  threshold  $  e  -I . A^^U.  find  i  which  minimizes 

J. 

3.  Take  to  be^the  6  which  yields  the  overall  minimum  J.  Take  to  be  the 
corresponding  t. 

4.  Take 

^(e  +  l)  _  f  ^  >  1 1 

I  *  QhO  ) 

5.  Set  e  =  e  “h  1.  If  A^^^  >  k  goto  step  2. 

Figure  30.  Fast  Messy  Genetic  Algorithm  Parameter  Selection  Technique 

computationally  intensive  nature  of  the  second  step,  in  which  the  optimal  selection  episode  duration  and 
associated  effectiveness  are  determined  for  each  meaningful  choice  of  the  threshold  parameter.  For  “difficult" 
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optimization  problems,  multiple  independent  runs  of  the  fast  messy  genetic  algorithm  are  necessary,  and  it  is 
appropriate  to  “amortize”  the  computational  cost  of  parameter  selection  over  the  number  of  runs  performed. 
Also,  the  same  (or  better)  overall  effectiveness  may  result  from  a  smaller  number  of  runs  of  a  more  effective 
algorithm  as  from  a  larger  number  of  runs  of  a  less  effective  algorithm.  Thus,  for  “difficult”  optimization 
problems,  the  computational  cost  of  the  second  step  is  justified. 

The  technique  may  be  viewed  as  a  hillclimbing  strategy,  in  the  sense  that  each  instance  of  the  third 
step  specifies  a  locally  optimal  choice  of  the  threshold  parameter  and  selection  episode  duration.  Kargupta’s 
technique  may  also  be  viewed  as  a  hillclimbing  strategy,  with  a  different  criterion  for  local  optimality  which 
does  not  consider  expected  effectiveness. 

The  choice  of  filtering  parameters  in  the  fourth  step  is  motivated  by  the  stated  design  objective  of 
Goldberg,  et  al.  [35]  and  Kargupta  [47].  That  is,  it  ensures  that  after  filtering  each  building  block  is  expected 
to  have  at  least  one  copy  in  the  population.  Importantly,  the  choice  is  made  based  on  the  expected  number 
of  copies  of  the  least  well  represented  building  block,  and  the  model  does  not  assume  that  the  number  of 
copies  of  that  building  block  doubles  in  each  generation  of  tournament  selection. 

6.3  gfmGA  Parameter  Selection 

The  filtering  and  thresholding  parameters  of  the  gfmGA  are  real-valued.  Furthermore,  the  cost  func¬ 
tion  defined  in  Section  6.1  is  continuously  differentiable  with  respect  to  the  parameters.  Consequently,  vector 
space  optimization  techniques  [50]  may  be  used  to  obtain  necessary  optimality  conditions  (NOCs)  for  the 
parameter  selection  problem.  This  section  discusses  the  application  of  the  Generalized  Kuhn-Tucker  The¬ 
orem  to  obtain  NOCs,  and  discusses  parameter  selection  techniques  for  the  gfmGA.  Luenberger  states  the 
Generalized  Kuhn-Tucker  Theorem  essentially  as  follows: 
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Theorem  6.3.1  (Generalized  Kuhn-Tucker)  Let  X  he  a  vector  space  and  Z  a  normed  space  having 
positive  cone  P.  Assume  that  P  contains  an  interior  point  Let  f  be  a  Gateaux  differentiable^  real-valued 
functional  on  X  and  G  a  Gateaux  differentiable  mapping  from  X  into  Z.  Assum.e  that  the  Gateaux  differen¬ 
tials  are  linear  in  their  increments.  Suppose  xq  minimizes  f  subject  to  G(x)  <  Oz  and  that  xq  is  a  regular 
point  of  the  inequality  G(x)  <  Oz-  Then  there  is  a  €  Z\  >  Oz-  such  that  the  Lagrangian 


fix)  +  {G{x),  Zg) 


is  stationary  at  Xq;  furthermore,  {G(xo).Zq)  =  0. 

Proof:  See  Luenberger  [50].  H 

As  an  immediate  consequence  of  this  theorem,  a  set  of  necessary  optimality  conditions  for  the  exogenous 
parameter  selection  problem  is  obtained. 


Corollary  6.3.2  Let  the  cost  function  J  and  the  constraint  mapping  G  be  as  defined  in  Section  6.1.  Suppose 
Xo  minimizes  J  subject  to  G(x)  <  Oz-  Then  there  exist  Zm  €  R”*,  €  R".  Zm  >  Opm  .  Zn  >  such  that 


>  0*, 


^*(xo) +  i:^Gi(xo)  -  =  0?n 


(29) 

(30) 

(31) 


and 


z^G(xo)  =  ^;[xo  =  0 


(32) 


berl?f50l1eTx  f  "W‘rary  vector  spaces  of  the  directional  differential.  Following  Luen¬ 
berger  [50j.  let  X  be  a  vector  space,  Y  a  normed  space.  D  C  X .  T  D  ^  Y.  x  e  D.  and  h  e  X.  U  the  limit  ^ 

ST{x:h)=  lim  -[T(a; -f  ah)  -  T(i)l 

a— ♦O  Ot  ^ 


139 


Proof:  The  constraint  space  Z  =  R’"+"  is  a  Euclidean  vector  space,  hence  the  positive  cone  P  is  the 

first  orthant.  P  contains  interior  points,  e.g.  (1 . 1).  The  transition  operators  and  defined  in 

Chapters  IV  and  V.  respectively,  are  both  differentiable  with  respect  to  each  of  the  filtering  and  thresholding 
parameters.  Consequently,  the  mapping  J„  defined  in  Section  6.1  is  also  differentiable  with  respect  to  the 
parameters,  and  furthermore,  so  is  J.  The  constraint  mapping  G  is  also  differentiable  with  respect  to  the 
parameters,  and  every  point  satisfying  G(x)  <  0^  also  is  a  regular  point  of  the  inequality  (i.e.  there  are  no 
cusps  in  the  constraint  boundaries).  Thus,  the  conditions  of  Theorem  6.3.1  are  satisfied. 

Up  to  isomorphism.  J  :  R"  R  and  G  :  R"  — .  R-+".  Suppose  xq  G  R"  minimizes  J  subject  to 
G{xq)  <  Oism+n.  Then  Xq  is  a  regular  point  of  G(x)  <  0cm+„,  and  (following  Luenberger  [50:Ex.  2.  §9.4]) 
the  constraint  may  be  written 

G{xo)  <  Opm  (33) 

and 

-  a:o  <  Oj;.  .  (34) 

where  G  :  R"  — ►  R*". 

Theorem  6.3.1  implies  that  there  exist  Zm  G  R”*  and  G  R”  such  that 


(35) 

Zn  ^  « 

(36) 

is  stationary  at  xq  . 

(37) 

ZmG{xo)  +  Z^(-Xo)  =  0  . 

(38) 

Condition  37  may  be  written 

Jx{xq)  +  z^Gx(xo)  -  2 J  =  0 
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Conditions  33  and  35  imply  that  z^dixo)  <  0,  while  Conditions  34  and  36  imply  that  -zJ^xq  <  0.  These 
conditions,  together  with  Condition  38  imply  that  zIG{xq)  =  z:^xo  =  0,  which  completes  the  proof.  ■ 

Equations  29  through  32,  which  form  a  system  of  simultaneous  non-linear  equations  in  the  exogenous 
parameters,  are  referred  to  as  the  necessary  optimality  conditions  (NOCs).  Because  the  cost  functional  J  is 
continuous  on  the  feasible  region,  which  is  a  compact  subset  of  a  metric  space,  J  attains  its  minimum  on  the 
region  (see  Theorem  4.28  of  Apostol  [3]),  i.e.  there  exists  an  Xp  which  minimizes  J  subject  to  G(x)  <  Oz. 
The  parameter  set  xp  yields  optimal  expected  effectiveness  of  the  generalized  fast  messy  genetic  algorithm. 
In  principle,  the  exogenous  parameter  selection  problem  reduces  to  the  problem  of  finding  Xp. 

By  the  preceding  argument,  the  existence  of  at  least  one  solution  Xp  of  the  NOCs  is  guaranteed. 
Under  certain  conditions,  the  solution  is  unique,  in  which  case  Equations  29  through  32  are  both  necessary 
and  sufficient  for  optimality.  In  particular,  if  the  Hessian  matrix^  of  J  is  positive  definite  on  the  entire 
feasible  region,  then  there  exists  a  unique  minimum  of  J  on  the  region,  and  hence  Xp  is  unique.  Because  the 
constraints  G(x)  <  Oz  define  a  convex  region  of  the  parameter  space,  xp  is  also  unique  in  the  more  general 
case  that  J  is  convex  on  the  region.  Finally,  xp  may  be  unique  even  if  J  is  not  convex. 

Analysis  of  the  positive  definiteness  of  the  Hessian  matrix  of  J  via  explicit  derivation  of  the  partial 
derivatives  is  tedious  and  unrewarding,  as  is  explicit  analysis  of  the  convexity  of  J.  The  question  of  the 
uniqueness  of  Xp  may  be  addressed  more  directly.  Because  is  well  approximated  by  a  high-order  polynomial 
111  the  exogenous  parameters,  the  roots  of  which  depend  on  the  objective  function,  it  seems  likely  that  there 
exist  (many)  objective  functions  for  which  the  stationary  points  of  J  include  points  of  local  maximum,  saddle 

points,  and  multiple  points  of  local  minimum.  For  generality,  it  is  assumed  in  the  sequel  that  the  solution 
of  the  NOCs  is  not  unique. 

2 Let  /  :  R"  R  and  p  €  R".  Then  the  matrix  A  whose  components  are 

A  d'^fjx) 
dxidxj 

p 

is  the  Hessian  matrix  of  f  at  p[64] 
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In  principle,  the  parameter  selection  problem  reduces  to  the  problem  of  finding  the  solution  xq  of  the 
NOCs  for  which  J  is  minimized.  In  practice,  the  explicit  form  of  the  resulting  expressions  does  not  suggest 
a  direct  solution  technique,  despite  extensive  analysis  and  consultation.  Thus,  solutions  must  be  obtained 
numerically.  Standard  techniques  for  simultaneous  solution  of  non-linear  equations  include  Newton-Raphson, 
globally  convergent  extensions  thereof,  and  Broyden’s  Method  [64]. 

Newton-Raphson  is  perhaps  the  simplest  and  best  known  multidimensional  root  finding  technique. 
Given  a  “good”  initial  guess  of  the  location  of  a  root,  it  converges  quadratically  to  the  root.  For  a  system 
of  equations  of  the  form  F(x)  =  0  with  Jacobian  matrix  J.  the  update  rule  is 

^new  “  ^old  , 


where  satisfies 


J-6x  =  -F  . 

Given  a  "poor”  initial  guess.  Newton-Raphson  fails  to  converge.  Variations  of  the  algorithm  overcome  this 
significant  limitation  by  requiring  that  each  step  reduce  /  i  i|Fp.  This  is  possible  because  each  step  is  in  a 
descent  direction  for  f.  Thus,  either  the  full  step  decreases  /,  or  a  smaller  step  in  the  same  direction  can  be 
found  which  decreases  /.  Both  Newton-Raphson  and  its  globally  convergent  extensions  require  the  existence 
of  the  Jacobian  matrix.  This  condition  is  satisfied  by  the  NOCs,  so  that  these  techniques  are  applicable. 

Even  though  the  Jacobian  matrix  exists  and  its  analytical  form  is  available,  its  evaluation  is  compu¬ 
tationally  intensive.  Consequently,  multidimensional  secant  methods,  such  as  Broyden’s  method,  may  be 
more  efficient  than  Newton-Raphson.  A  thorough  discussion  of  this  technique,  as  well  as  a  reference  to  the 
primary  literature,  may  be  found  in  Press,  et  al.  [64]. 

The  exogenous  parameter  selection  problem  for  generalized  fast  messy  genetic  algorithms  may  be 
approached  by  identifying  solutions  to  the  NOCs  and  selecting  the  solution  for  which  the  cost  functional  J 
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is  mininiized.  A  paiaiiieter  selection  technique  for  generalized  fast  messy  genetic  algorithms  may  be  stated 
as  shown  in  Figure  31.  This  technique  satisfies  the  formal  acceptability  criteria  established  in  Chapter  I. 

1.  Let  xo  be  the  best  currently  known  set  of  exogenous  parameters  for  the  fast 
messy  genetic  algorithm. 

2.  While  the  termination  condition  is  not  satisfied: 

(a)  Obtain  a  solution  x  of  the  NOCs. 

(b)  If  J(x)  <  J(xo)  then  replace  Xq  by  x. 

Figure  31.  Generalized  Fast  Messy  Genetic  Algorithm  Parameter  Selection  Technique  Based  on  Solution 
of  the  Necessary  Optimality  Conditions 

The  criteria  are: 

1.  the  technique  guarantees  expected  effectiveness  no  worse  than  that  resulting  from  the  best  set  of 
parameters  obtained  using  existing  techniques. 

2.  the  technique  requires  no  a  priori  knowledge  of  the  optimal  solution, 

3.  the  technique  requires  no  design  parameters  beyond  those  of  the  linkage-friendly  genetic  algorithm: 
and 

4.  the  computational  effort  required  by  the  technique  scales  well  with  the  effort  required  by  the  linkage- 
friendly  genetic  algorithm. 

The  technique  is  essentially  a  simplistic  search  algorithm.  It  generates  candidates  from  the  set  of 
solutions  of  the  NOCs,  which  includes  the  local  maxima  and  saddle  points  of  ./,  as  well  as  the  local  minima. 
The  amount  of  computation  required  to  obtain  a  solution  to  the  NOCs  is  approximately  that  required  to 
obtain  a  local  minimum  of  the  cost  function  J.  Thus,  the  technique  is  likely  to  be  less  efficient  than  one 
which  randomly  generates  candidates  from  the  set  of  local  minima  (see  Figure  32). 

Even  more  promising  strategies  result  from  the  use  of  standard  constrained  optimum  seeking  techniques 
to  minimize  J.  The  literature  abounds  with  applicable  techniques,  including  simulated  annealing,  tabu 
search,  and  evolutionary  algorithms.  Because  the  cost  function  is  continuously  differentiable,  it  is  worthwhUe 
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1.  Let  xo  be  the  best  currently  known  set  of  exogenous  parameters  for  the  fast 
messy  genetic  algorithm. 

2.  While  the  termination  condition  is  not  satisfied: 

(a)  Obtain  a  point  of  local  minimum  x  for  J. 

(b)  If  J(x)  <  J(xo)  then  replace  xq  by  x. 

Figure  32.  Generalized  Fast  Messy  Genetic  Algorithm  Parameter  Selection  Technique  Based  on  Optimiza¬ 
tion  of  the  Cost  Function 

to  consider  hybrid  techniques  which  combine  a  globally  convergent  technique  (e.g.  genetic  algorithms)  with 
an  efficient  local  optimization  technique  (e.g.  conjugate  gradient).  Such  hybrids  serve  as  effective  optimum 
seeking  techniques  for  other  objective  functions  with  similar  properties  [55,  56]. 

6-4  Summary 

The  linkage-friendly  genetic  algorithm  exogenous  parameter  selection  problem  is  formally  posed  as  a 
constrained  optimization  problem.  By  viewing  the  fast  messy  genetic  algorithms  parameter  selection  problem 
in  this  way,  a  hillclimbing  technique  is  obtained  which  represents  a  substantial  improvement  over  existing 
techniques.  The  Generalized  Kuhn-Tucker  Theorem  is  employed  to  obtain  necessary  optimality  conditions 
(NOCs)  for  the  generalized  fast  messy  genetic  algorithm  parameter  selection  problem.  Several  techniques 
are  suggested  by  which  the  problem  may  be  solved  in  practice,  including  numerical  solution  of  the  NOCs 
and  a  hybrid  genetic  algorithm  which  incorporates  efficient  local  optimization. 
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VIL  Conclusions  and  Recommendations 


The  primary  objectives  of  this  research  are  to 

•  mathematically  model  those  properties  of  specific  linkage-friendly  genetic  algorithms  which  are  related 
to  expected  effectiveness;  and 

#  develop  exogenous  parameter  selection  techniques  for  those  linkage-friendly  genetic  algorithms,  focusing 
on  maximizing  their  expected  effectiveness. 

The  major  conclusions  are  summarized  in  Section  7.1.  and  recommendations  for  future  research  are 
presented  in  Section  7.2 

7,1  Conclusions 

Formal  framework  for  evolutionary  algorithms.  Evolutionary  algorithms  are  a  class  of  stochastic 
population-based  algorithms  which  are  commonly  applied  as  optimum  seeking  techniques.  A  novel  framework 
for  evolutionary  algorithms  is  proposed  which  extends  the  work  of  Back  and  Schwefel  (Section  2.3).  Within 
this  formal  framework,  evolutionary  operators  are  viewed  as  mappings  from  parameter  spaces  to  random 
population  transformations.  Definitions  of  recombination,  mutation,  and  selection  operators  are  proposed 
which  capture  their  distinguishing  characteristics. 

Linkage-friendly  genetic  algorithms  (IfGAs).  The  class  of  IfGAs  consists  of  evolutionary  algo¬ 
rithms  which  use  order-invariant  representation  schemes  and  strictly  invariant  selection  operators.  Previously 
studied  examples  of  the  class  include  the  messy  genetic  algorithm  (inGA)  and  the  fast  messy  genetic  algo¬ 
rithm  (fmGA),  which  are  defined  within  the  formal  framework  for  evolutionary  algorithms  in  Sections  2.6.3 
and  2.6.4.  respectively. 

The  inGA  and  fmGA  represent  theoretical  steps  towards  effective  linkage-friendly  genetic  algorithms. 
However,  the  mGA  is  0([card(A)  •  f]*")  in  time  and  space,  where  A  is  the  genic  alphabet.  I  is  the  problem 
size,  and  k  is  the  building  block  size.  The  fmGA  addresses  this  drawback,  but  it  also  introduces  numerous 
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exogenous  parameters  for  which  no  practical  selection  methodology  is  known.  Experience  shows  that  the 
effectiveness  of  the  fmGA  is  highly  sensitive  to  the  choice  of  these  exogenous  parameters  [28]. 

Chapter  III  proposes  a  novel  IfGA.  the  generalized  fast  messy  genetic  algorithm  (gfniGA),  which  uses 
probabilistic  generalizations  of  the  filtering  (mutation)  and  selection  operators  used  by  the  fmGA.  The  fmGA 
is  a  special  case  of  the  gfmGA.  Consequently,  existence  is  guaranteed  of  parameters  for  which  the  gfmGA 
expected  effectiveness  is  no  worse  than  the  best  possible  fmGA  expected  effectiveness  (Section  3.3). 

Dynamical  systems  models  of  probabilistic  operators.  The  practical  application  of  the  fmGA  is 
limited  by  the  lack  of  an  acceptable  parameter  selection  methodology.  Existing  techniques  are  handicapped 
by  a  poor  understanding  of  the  relationship  between  the  filtering  and  thresholding  parameters  of  the  algo¬ 
rithm  and  the  expected  effectiveness.  This  research  develops  a  dynamical  systems  model  of  the  gfmGA  (and 
of  the  fmGA  as  a  special  case)  which  predicts  the  expected  effectiveness  as  a  function  of  the  filtering  and 
thresholding  parameters.  The  key  elements  of  the  model  are: 

1.  Probability  of  building  block  presence  after  probabilistic  filtering.  Previous  models  of  build¬ 
ing  block  filtering  considered  only  deterministic  and  destructive  filtering.  This  research  (Chapter  IV) 
extends  these  models  to  consider  probabilistic  and  possibly  increasing  individual  lengths.  Probabilities 
of  survival  and  construction  are  combined  to  yield  the  total  probability  of  building  block  presence 
following  filtering. 

2.  Order  statistical  analysis  of  the  probability  of  correct  decision  making.  Early  linkage-friendly 
genetic  algorithm  studies  aim  at  improving  probabilities  of  correct  decision  making  (whether  or  not  this 
is  explicitly  stated),  but  those  probabilities  are  inadequately  modeled.  Previous  models  of  tournament 
selection  focus  on  either  takeover  time  or  selection  intensity.  Neither  model  provides  information 
regarding  the  relative  growth  of  one  class  of  individuals  with  respect  to  another  (except  the  growth  of 
the  “besf  individuals  with  respect  to  the  “worst”  individuals).  This  research  develops  the  probability 
of  correct  decision  making  exactly  and  explicitly  in  terms  of  the  initial  fitness  distributions  of  the 
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competing  classes  and  the  number  of  ancestors  belonging  to  each  of  the  classes  for  each  competitor 
(Section  5.1). 

3.  Markov  chain  analysis  of  competition  in  the  presence  of  non-trivial  thresholding.  Another 
limitation  of  previous  tournament  selection  models  is  that  they  neglect  thresholding.  By  affecting 
the  pairs  of  individuals  which  are  considered  compatible,  thresholding  affects  not  only  the  effective 
probability  of  correct  decision  making,  but  also  the  effective  selective  pressure.  This  research  uses 
Markov  chain  analysis  to  develop  an  exact  dynamical  systems  model  of  competing  classes  of  individuals 
under  binary  tournament  selection  with  (probabilistic)  thresholding  (Section  5.2). 

Parameter  selection  techniques  based  on  maximizing  expected  effectiveness.  The  math¬ 
ematical  model  developed  permits  the  design  of  parameter  selection  techniques  which  explicitly  consider 
the  expected  effectiveness  of  the  algorithm.  This  research  considers  a  parameter  selection  technique  to  be 
acceptable  if  it  satisfies  the  following  criteria: 

1.  the  technique  guarantees  expected  effectiveness  no  worse  than  that  resulting  from  the  best  set  of 
parameters  obtained  using  existing  techniques, 

2.  the  technique  requires  no  a  priori  knowledge  of  the  optimal  solution. 

3.  the  technique  requires  no  design  parameters  beyond  those  of  the  linkage-friendly  genetic  algorithm; 
and 

4.  the  computational  effort  required  by  the  technique  scales  well  with  the  effort  required  by  the  linkage- 
friendly  genetic  algorithm. 

The  parameter  selection  problem  is  formally  posed  as  a  constrained  optimization  problem  (Section  6.1). 
An  fmGA  parameter  selection  technique  based  on  hill-climbing  is  proposed  which  satisfies  the  acceptability 
criteria  (Section  6.2).  In  part  because  the  gfmGA  parameters  are  real-valued,  vector  space  optimization 
techniques  (specifically,  the  Generalized  Kuhn-Tucker  Theorem)  may  be  used  to  obtain  formal  necessary 
optimality  conditions  (NOCs)  for  the  gfmGA  parameters  (Section  6.3).  One  gfmGA  parameter  selection 


147 


technique  is  proposed  which  is  based  on  numerical  solution  of  the  NOCs.  A  second  technique  is  proposed 
based  on  computational  optimization  of  the  cost  functional. 

1.2  Recommendations 

This  research  answers  a  number  of  questions  regarding  the  properties  of  linkage-friendly  genetic  algo¬ 
rithms.  It  also  suggests  a  number  of  promising  areas  for  additional  research; 

1.  Fitness  distributions  after  building  block  filtering.  The  building  block  filtering  model  developed 
in  Chapter  IV  considers  only  the  probability  of  building  block  presence  after  filtering.  It  provides  no 
information  regarding  the  resulting  fitness  distributions.  The  availability  of  such  information  would 
provide  the  initial  fitness  distributions  required  to  model  the  tournament  selection  episode  following 
the  filtering  event. 

2.  Non-monotonicity  of  the  probability  of  correct  decision  making.  The  use  of  thresholding  in 
binary  tournament  selection  is  predicated  on  the  assumption  that  the  probability  of  correct  decision 
making  depends  on  the  thresholding  metric.  In  particular,  the  messy  genetic  algorithm  and  fast  messy 
genetic  algorithm  implicitly  assume  that  pd  is  a  non-decreasing  function  of  the  number  of  common 
defining  loci.  Limited  empirical  results  (not  reported  here)  based  on  the  tournament  selection  model 
developed  in  Chapter  V  suggest  that  this  assumption  is  incorrect.  These  results  suggest  that  better 
decision  making  may  result  from  a  compatibility  criteria  which  places  both  upper  and  lower  bounds 
on  the  number  of  common  defining  loci, 

3.  Extension  of  tomnament  selection  model  to  competition  between  N  classes.  The  math¬ 
ematical  model  of  tournament  selection  developed  in  this  research  (Sections  5.1  and  5.2)  focuses  on 
competition  between  two  classes  of  individuals.  It  is  natural  to  extend  the  model  to  competition  be¬ 
tween  N  classes  of  individuals.  Such  an  extension  would,  for  example,  facilitate  more  accurate  modeling 
of  the  effects  of  the  presence  of  individuals  which  contain  multiple  building  blocks. 
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4.  Efficiency  considerations  of  parameter  selection.  This  research  considered  only  effectiveness 
in  the  definition  of  algorithm  performance.  Another  important  aspect  of  performance  is  efficiency. 
Future  research  should  examine  the  appropriate  definition  of  an  efficiency  functional,  and  the  appro¬ 
priate  means  by  which  to  consider  both  effectiveness  and  efficiency  in  selecting  exogenous  parameters. 
Performance  may  be  defined  as  a  convex  combination  of  effectiveness  and  efficiency.  Alternatively,  the 
parameter  selection  problem  may  be  viewed  as  a  multi-objective  optimization  problem. 

5.  Application  of  the  gfmGA  to  practical  problems.  Future  research  also  includes  application 
of  the  gfmGA  to  real  world  problems,  such  as  the  polypeptide  structure  prediction  problem.  The 
AFIT/WL  Genetic  Computation  Techniques  (AGCT)  research  group  performs  a  number  of  state-of- 
the-art  investigations  in  the  application  of  evolutionary  algorithms  to  this  problem  (e.g.  [54]).  Several 
issues  must  be  addressed. 

•  The  inherently  discrete  nature  of  gfmGA  individual  spaces  strongly  suggests  that  reasonable  ef¬ 
fectiveness  may  be  expected  only  for  objective  functions  with  a  combinatoric  character.  The 
polypeptide  structure  prediction  problem  exhibits  both  combinatoric  and  continuous  character¬ 
istics.  which  suggests  hybridization  of  the  gfmGA  with  efficient  local  minimization  techniques 
(c.f.  [56]). 

•  The  computational  resources  necessary  to  solve  a  real-world  polypeptide  structure  prediction 
problem  require  the  use  of  high-performance  scalable  architectures.  Existing  mappings  of  the 
fmGA  to  such  architectures  (e.g.  [28])  provide  a  reasonable  point  of  departure  for  determining 
appropriate  mappings  of  the  gfmGA.  Appropriate  mappings  of  the  parameter  selection  techniques 
to  scalable  architectures  are  also  required. 

•  The  prediction  of  expected  effectiveness,  and  consequently  the  selection  of  gfmGA  parameters, 
requires  estimation  of  the  initial  fitness  distributions.  This  estimate  may  be  obtained  by  as¬ 
sumptions  based  on  physical  insight  (e.g.  distributional  form,  signal  difference),  and  parameter 
estimates  based  on  a  uniform  sampling  of  conformation  space  (e.g.  mean,  variance). 
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Appendix  A.  Cardinalities  for  Decision  Making  Model 


The  distributions  of  fitnesses  in  a  uniform  random  population  developed  in  Section  5.3  are  expressed  in 
terms  of  certain  cardinalities  of  subsets  of  /  and  P,  where  I  is  the  individual  space.  This  appendix  presents 
expressions  for  these  cardinalities.  |^|  denotes  the  cardinality  of  A. 
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Appendix  B,  Numerical  Techniques  Used  in  Tournament  Selection  Experiments 

The  computational  experiments  reported  in  Chapter  V  assume  that  the  unconditional  fitness  densities 
/  and  g  and  the  conditional  fitness  densities  fQ  and  gu  of  the  competing  classes  are  those  of  the  normal 
distributions  N{fiA.cr\),  iV(/XB,<7|),  and  respectively.  Consequently,  the 

probabilities  of  correct  decision  making  for  individuals  with  more  than  one  ancestor  do  not  have  closed  form 
solutions.  This  appendix  discusses  the  numerical  techniques  used  to  compute  the  probabilities. 

The  integral 


J~OC  J-y. 


may  be  formally  expressed  as 


I 


where 


Jto 

Because  the  integral  operator  is  additive  with  respect  to  the  interval  of  integration. 

N 

Ii{—oo.x)  =  I\(—oc,ao)  +  ^^Ii{ai-i.ai)  +  Ii[aj^.x)  . 

i=l 

Each  of  the  integrals  is  evaluated  numerically,  using  Romberg  integration  [64],  The  first  integral  is  improper, 
and  is  evaluated  via  the  change  of  variable 
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which  yields 
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for  a  >  fLB\Q.  The  a^’s  are  chosen  from  {/i^  -  ct^.^a  +  cta./xb  -  f^B./^B  +  crs^fJ^BlQ  -  +  crB\u} 

to  satisfy  -oc  <  ao  <  •  •  •  <  ai\r  <  x.  The  integral  I  is  then  evaluated  using  Gaussian  quadrature  [64]  via 
Gauss-Hermite  polynomials. 

The  integral 
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may  be  expressed  as 


=  /i(-oc.ao)  +  ^J’i(ai-i,ai) +/i(rt7v.oc)  . 

i=l 

and  each  integral  is  evaluated  using  Romberg  integration,  with  the  ai's  chosen  as  above.  The  first  and  last 
integrals  are  both  improper,  and  are  evaluated  using  the  above  change  of  variable  technique.  Finally,  the 
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and  each  integral  is  evaluated  using  Romberg  integration,  with  the  a^’s  chosen  from  {/ia  -  a  a,  fiA  +  c^a- /ts  - 
o-b-Mb  +  o-B.M>i|n  -  to  satisfy  -oc  <  ao  <  •  •  •  <  ajv  <  x.  Again,  The  first  and  last 

integrals  are  both  improper,  and  are  evaluated  via  the  change  of  variable 
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